Voyager | Just Add AI

The Idea

Most AI agents solve each problem from scratch. They don't remember what worked before or build on past successes. Voyager is different — it's a self-teaching agent that continuously learns new skills, stores them, and reuses them. Each solved problem becomes a building block for harder problems.

Originally demonstrated in Minecraft (where it taught itself to go from chopping wood to crafting diamond tools with zero human intervention), the pattern generalizes to any domain with executable feedback. The agent decides what to learn next, writes code to accomplish it, tests it, refines until it works, and stores the verified solution in a growing skill library.

Component Patterns

This system composes Level 2 patterns into a learning loop:

Meta-Prompting RAG Patterns Reflexion Program of Thoughts

Meta-Prompting generates the curriculum (what to learn next). RAG retrieves relevant existing skills. Reflexion handles the iterative refinement loop. Program of Thoughts grounds everything in executable code.

Three Interlocking Loops

Curriculum

"What should I learn next?" The agent examines its current capabilities and proposes the next challenge — achievable, novel, and progressive.

Skill Creation

Write code to accomplish the task. Execute it. If it fails, read the error, refine the code, and try again. Repeat until it works or give up.

Skill Library

Verified solutions are stored with descriptions and embeddings. Future tasks retrieve similar past skills as context and building blocks.

See It in Action

A Minecraft agent teaching itself progressively harder skills.

Iteration 1 Mine Wood Log

Curriculum → Code → Verify

Current state: empty inventory. Curriculum suggests: "Mine a wood log." Agent generates mineWoodLog() code, executes, succeeds. Skill added to library.

↓ skills compound over time

Iteration 5 Smelt Iron Ore

Retrieves similar skills as context

Has wooden tools. Curriculum suggests: "Mine iron ore and smelt it." Retrieves [mineWoodLog, craftFurnace] from library.

Attempt 1: "ironOreBlock is not defined" → feedback
Attempt 2: "findBlock is not a function" → feedback
Attempt 3: Uses correct API → SUCCESS. Skill added.

↓ composing skills into complex behaviors

Iteration 50 Craft Diamond Pickaxe

Composing 30+ existing skills

Has iron tools and 30+ verified skills. Curriculum suggests the big challenge: "Craft a diamond pickaxe." Retrieves and composes [mineStone, craftFurnace, smeltIronOre, mineDiamondOre] into a complex multi-step skill.

The Growing Skill Library

mineWoodLog craftPlanks craftSticks craftWorkbench craftWoodPickaxe mineStone craftFurnace smeltIronOre craftIronPickaxe mineDiamondOre craftDiamondPickaxe

Each skill is verified code that can be retrieved, composed, and reused.

The Numbers

Voyager (unique items)

Reflexion

ReAct

AutoGPT

After 160 iterations in Minecraft. Skill accumulation is the difference maker.

Why This Works

The skill library is the breakthrough. Without it, the agent starts from zero every time — like a student who never takes notes. With the library, solved problems become tools: the agent retrieves relevant past solutions as context and building blocks, dramatically reducing the work needed for new but similar challenges.

The automatic curriculum is equally important. Rather than following a fixed sequence, the agent assesses what it can currently do and proposes the most useful next challenge. This self-directed learning means it focuses effort where it has the highest chance of making progress.

The System

Decide what to learn. Write code to do it. Test, refine, verify. Store the working solution. Retrieve and compose past solutions for harder challenges. A self-teaching agent with a growing skill library.

When to Use This

• Open-ended exploration where accumulated skills compound over time
• Domains with executable feedback (code runs, tests pass, game state changes)
• Building a transferable library of reusable AI capabilities
• Long-horizon tasks where past solutions are relevant to future problems

When to Skip This

• One-shot tasks — no benefit from skill accumulation if you won't face similar problems
• No executable feedback — the iterative refinement loop needs clear success/failure signals
• Real-time applications — iterative code refinement takes time
• Non-code domains — the pattern works best when skills can be expressed as executable code

How It Relates

Voyager adds what AutoGPT/BabyAGI lacks: skill accumulation. Where autonomous agents repeat mistakes, Voyager stores solutions and builds on them. The Cognitive Loop can invoke Voyager when it encounters a skill gap. JARVIS can expose learned skills as callable tools for other agents.

At Level 4, Self-Improving Systems extend this concept beyond skills to the system's own architecture and prompts, and World Model Agents add simulation to predict which skills to learn before committing to execution.