The Idea

When you ask AI to do something complex, most agent patterns work through it one step at a time: think, act, observe, think, act, observe. Even patterns like ReWOO that plan ahead still execute tool calls one after another in sequence.

LLMCompiler thinks like a real compiler. It reads your request, identifies all the tasks needed, maps out which ones depend on which, and then runs everything that can happen simultaneously at the same time. Three independent searches? Fire them all at once. A comparison that needs all three results? It waits only for those, then runs immediately.

Even better, it starts executing tasks while still planning the rest. The moment the first independent task is identified, it's already running — no need to wait for the full plan to finish.

Building Blocks

This composition builds on:

Plan-and-Execute ReWOO

LLMCompiler takes Plan-and-Execute's separation of planning and execution, ReWOO's placeholder variables and batch approach, and adds true parallel execution plus streaming overlap between planning and execution.

The Four Components

📝

Planner

AI generates a numbered list of tasks with explicit dependencies. Streams tasks out as they're generated — execution starts before planning finishes.

📋

Task Fetching Unit

Monitors which tasks have all their dependencies satisfied. Moves ready tasks into the execution queue the moment they're unblocked.

Executor

Runs all ready tasks simultaneously. Multiple tool calls fire at once. No waiting in line when tasks are independent.

🎯

Joiner

Reviews all results when execution is done. Either returns the final answer or triggers a new round of planning if more information is needed.

See It in Action

Question: "Compare the current weather in New York, Los Angeles, and London."

Planner (streaming)
1
AI maps out tasks with dependencies
The task graph
Task 1: search("New York weather") → $1    [no dependencies]
Task 2: search("Los Angeles weather") → $2    [no dependencies]
Task 3: search("London weather") → $3    [no dependencies]
Task 4: compare($1, $2, $3) → $4    [depends on 1, 2, 3]

Tasks 1, 2, and 3 start running immediately as the planner streams them — no waiting for the full plan.

↓ independent tasks run in parallel
Executor (parallel)
2
Three searches fire simultaneously
Tasks 1, 2, 3 — all at the same time
Search "New York weather" → 72°F, Sunny
Search "Los Angeles weather" → 85°F, Clear
Search "London weather" → 59°F, Cloudy

All three finish in ~1 second total instead of ~3 seconds sequential.

↓ all dependencies met
Executor → Joiner
3
Comparison runs, then Joiner returns the answer
Final answer
Here's the weather comparison: New York is 72°F and sunny, Los Angeles is the warmest at 85°F with clear skies, and London is the coolest at 59°F with clouds. If you want warmth, head to LA!

Why Parallel Matters

Sequential (ReAct-style)

0–1s
Search NYC
1–2s
Search LA
2–3s
Search London
3–4s
Compare all three

Total: ~4 seconds


Parallel (LLMCompiler)

0–1s
Search NYC
Search LA
Search London
1–1.5s
Compare all three

Total: ~1.5 seconds — nearly 3x faster

How It Compares

ReAct

Think → act → observe → repeat. Fully sequential. Every step waits for the previous one. Most flexible, but slowest and most expensive.

ReWOO

Plan all at once → execute sequentially → synthesize. Saves on AI calls, but tools still run one at a time. Cost-optimized but not speed-optimized.

LLMCompiler

Plan as a dependency graph → execute independent tasks in parallel → synthesize. Saves both cost and time. Up to 3.7x faster, 6.7x cheaper than ReAct.

Why This Works

Most multi-step tasks contain hidden parallelism. "Compare weather in three cities" requires three independent lookups — there's no reason to do them one at a time. By expressing the plan as a dependency graph rather than a flat list, LLMCompiler discovers exactly which tasks can overlap.

The streaming trick adds another layer of speed: the planner doesn't even need to finish writing the full plan before execution begins. As soon as it emits a task with no dependencies, that task is already running. Planning and execution happen simultaneously.

The Composition

Turn every request into a dependency graph. Fire independent tasks in parallel. Start executing before planning finishes. The result: dramatically faster and cheaper than sequential approaches.

When to Use This

When to Skip This

How It Relates

LLMCompiler is a speed-and-cost-optimized evolution of Plan-and-Execute and ReWOO. Plan-and-Execute separates planning from execution but still runs steps sequentially. ReWOO reduces AI calls by using placeholders but executes tools one at a time. LLMCompiler adds the final piece: true parallel execution of independent tasks.

For tasks with lots of independent sub-components (like searching multiple sources), the speedup can be dramatic — nearly linear with the number of parallel tasks. For tasks that are inherently sequential, it gracefully falls back to sequential execution, behaving much like Plan-and-Execute.