The Idea
When you ask AI to do something complex, most agent patterns work through it one step at a time: think, act, observe, think, act, observe. Even patterns like ReWOO that plan ahead still execute tool calls one after another in sequence.
LLMCompiler thinks like a real compiler. It reads your request, identifies all the tasks needed, maps out which ones depend on which, and then runs everything that can happen simultaneously at the same time. Three independent searches? Fire them all at once. A comparison that needs all three results? It waits only for those, then runs immediately.
Even better, it starts executing tasks while still planning the rest. The moment the first independent task is identified, it's already running — no need to wait for the full plan to finish.
Building Blocks
This composition builds on:
Plan-and-Execute ReWOOLLMCompiler takes Plan-and-Execute's separation of planning and execution, ReWOO's placeholder variables and batch approach, and adds true parallel execution plus streaming overlap between planning and execution.
The Four Components
Planner
AI generates a numbered list of tasks with explicit dependencies. Streams tasks out as they're generated — execution starts before planning finishes.
Task Fetching Unit
Monitors which tasks have all their dependencies satisfied. Moves ready tasks into the execution queue the moment they're unblocked.
Executor
Runs all ready tasks simultaneously. Multiple tool calls fire at once. No waiting in line when tasks are independent.
Joiner
Reviews all results when execution is done. Either returns the final answer or triggers a new round of planning if more information is needed.
See It in Action
Question: "Compare the current weather in New York, Los Angeles, and London."
Task 2: search("Los Angeles weather") → $2 [no dependencies]
Task 3: search("London weather") → $3 [no dependencies]
Task 4: compare($1, $2, $3) → $4 [depends on 1, 2, 3]
Tasks 1, 2, and 3 start running immediately as the planner streams them — no waiting for the full plan.
Search "Los Angeles weather" → 85°F, Clear
Search "London weather" → 59°F, Cloudy
All three finish in ~1 second total instead of ~3 seconds sequential.
Why Parallel Matters
Sequential (ReAct-style)
Total: ~4 seconds
Parallel (LLMCompiler)
Total: ~1.5 seconds — nearly 3x faster
How It Compares
ReAct
Think → act → observe → repeat. Fully sequential. Every step waits for the previous one. Most flexible, but slowest and most expensive.
ReWOO
Plan all at once → execute sequentially → synthesize. Saves on AI calls, but tools still run one at a time. Cost-optimized but not speed-optimized.
LLMCompiler
Plan as a dependency graph → execute independent tasks in parallel → synthesize. Saves both cost and time. Up to 3.7x faster, 6.7x cheaper than ReAct.
Why This Works
Most multi-step tasks contain hidden parallelism. "Compare weather in three cities" requires three independent lookups — there's no reason to do them one at a time. By expressing the plan as a dependency graph rather than a flat list, LLMCompiler discovers exactly which tasks can overlap.
The streaming trick adds another layer of speed: the planner doesn't even need to finish writing the full plan before execution begins. As soon as it emits a task with no dependencies, that task is already running. Planning and execution happen simultaneously.
The Composition
Turn every request into a dependency graph. Fire independent tasks in parallel. Start executing before planning finishes. The result: dramatically faster and cheaper than sequential approaches.
When to Use This
- • Tasks with independent sub-components that can run at the same time (multiple searches, lookups, API calls)
- • Speed-critical applications where users are waiting for results
- • Batch data processing and aggregation across multiple sources
- • Complex queries that naturally decompose into a graph of sub-tasks
When to Skip This
- • Purely sequential tasks — if every step truly depends on the one before it, there's nothing to parallelize
- • Simple single-step queries — the planning overhead isn't worth it for one tool call
- • Highly exploratory tasks — if you can't know what to do next until you see the previous result, a reactive pattern is better
How It Relates
LLMCompiler is a speed-and-cost-optimized evolution of Plan-and-Execute and ReWOO. Plan-and-Execute separates planning from execution but still runs steps sequentially. ReWOO reduces AI calls by using placeholders but executes tools one at a time. LLMCompiler adds the final piece: true parallel execution of independent tasks.
For tasks with lots of independent sub-components (like searching multiple sources), the speedup can be dramatic — nearly linear with the number of parallel tasks. For tasks that are inherently sequential, it gracefully falls back to sequential execution, behaving much like Plan-and-Execute.