LLMCompiler

The Idea

When you ask AI to do something complex, most agent patterns work through it one step at a time: think, act, observe, think, act, observe. Even patterns like ReWOO that plan ahead still execute tool calls one after another in sequence.

LLMCompiler thinks like a real compiler. It reads your request, identifies all the tasks needed, maps out which ones depend on which, and then runs everything that can happen simultaneously at the same time. Three independent searches? Fire them all at once. A comparison that needs all three results? It waits only for those, then runs immediately.

Even better, it starts executing tasks while still planning the rest. The moment the first independent task is identified, it's already running — no need to wait for the full plan to finish.

Building Blocks

This composition builds on:

Plan-and-Execute ReWOO

LLMCompiler takes Plan-and-Execute's separation of planning and execution, ReWOO's placeholder variables and batch approach, and adds true parallel execution plus streaming overlap between planning and execution.

The Four Components

📝

Planner

AI generates a numbered list of tasks with explicit dependencies. Streams tasks out as they're generated — execution starts before planning finishes.

📋

Task Fetching Unit

Monitors which tasks have all their dependencies satisfied. Moves ready tasks into the execution queue the moment they're unblocked.

⚡

Executor

Runs all ready tasks simultaneously. Multiple tool calls fire at once. No waiting in line when tasks are independent.

🎯

Joiner

Reviews all results when execution is done. Either returns the final answer or triggers a new round of planning if more information is needed.

See It in Action

Question: "Compare the current weather in New York, Los Angeles, and London."

Planner (streaming)

AI maps out tasks with dependencies

The task graph

Task 1: search("New York weather") → $1    [no dependencies]
Task 2: search("Los Angeles weather") → $2    [no dependencies]
Task 3: search("London weather") → $3    [no dependencies]
Task 4: compare($1, $2, $3) → $4    [depends on 1, 2, 3]

Tasks 1, 2, and 3 start running immediately as the planner streams them — no waiting for the full plan.

↓ independent tasks run in parallel

Executor (parallel)

Three searches fire simultaneously

Tasks 1, 2, 3 — all at the same time

Search "New York weather" → 72°F, Sunny
Search "Los Angeles weather" → 85°F, Clear
Search "London weather" → 59°F, Cloudy

All three finish in ~1 second total instead of ~3 seconds sequential.

↓ all dependencies met

Executor → Joiner

Comparison runs, then Joiner returns the answer

Final answer

Here's the weather comparison: New York is 72°F and sunny, Los Angeles is the warmest at 85°F with clear skies, and London is the coolest at 59°F with clouds. If you want warmth, head to LA!

Why Parallel Matters

Sequential (ReAct-style)

0–1s

Search NYC

1–2s

Search LA

2–3s

Search London

3–4s

Compare all three

Total: ~4 seconds

Parallel (LLMCompiler)

0–1s

Search NYC

Search LA

Search London

1–1.5s

Compare all three

Total: ~1.5 seconds — nearly 3x faster

How It Compares

ReAct

Think → act → observe → repeat. Fully sequential. Every step waits for the previous one. Most flexible, but slowest and most expensive.

ReWOO

Plan all at once → execute sequentially → synthesize. Saves on AI calls, but tools still run one at a time. Cost-optimized but not speed-optimized.

Plan as a dependency graph → execute independent tasks in parallel → synthesize. Saves both cost and time. Up to 3.7x faster, 6.7x cheaper than ReAct.

Why This Works

Most multi-step tasks contain hidden parallelism. "Compare weather in three cities" requires three independent lookups — there's no reason to do them one at a time. By expressing the plan as a dependency graph rather than a flat list, LLMCompiler discovers exactly which tasks can overlap.

The streaming trick adds another layer of speed: the planner doesn't even need to finish writing the full plan before execution begins. As soon as it emits a task with no dependencies, that task is already running. Planning and execution happen simultaneously.

The Composition

Turn every request into a dependency graph. Fire independent tasks in parallel. Start executing before planning finishes. The result: dramatically faster and cheaper than sequential approaches.

When to Use This

• Tasks with independent sub-components that can run at the same time (multiple searches, lookups, API calls)
• Speed-critical applications where users are waiting for results
• Batch data processing and aggregation across multiple sources
• Complex queries that naturally decompose into a graph of sub-tasks

When to Skip This

• Purely sequential tasks — if every step truly depends on the one before it, there's nothing to parallelize
• Simple single-step queries — the planning overhead isn't worth it for one tool call
• Highly exploratory tasks — if you can't know what to do next until you see the previous result, a reactive pattern is better

How It Relates

LLMCompiler is a speed-and-cost-optimized evolution of Plan-and-Execute and ReWOO. Plan-and-Execute separates planning from execution but still runs steps sequentially. ReWOO reduces AI calls by using placeholders but executes tools one at a time. LLMCompiler adds the final piece: true parallel execution of independent tasks.

For tasks with lots of independent sub-components (like searching multiple sources), the speedup can be dramatic — nearly linear with the number of parallel tasks. For tasks that are inherently sequential, it gracefully falls back to sequential execution, behaving much like Plan-and-Execute.