The Idea
Most AI agents solve each problem from scratch. They don't remember what worked before or build on past successes. Voyager is different — it's a self-teaching agent that continuously learns new skills, stores them, and reuses them. Each solved problem becomes a building block for harder problems.
Originally demonstrated in Minecraft (where it taught itself to go from chopping wood to crafting diamond tools with zero human intervention), the pattern generalizes to any domain with executable feedback. The agent decides what to learn next, writes code to accomplish it, tests it, refines until it works, and stores the verified solution in a growing skill library.
Component Patterns
This system composes Level 2 patterns into a learning loop:
Meta-Prompting RAG Patterns Reflexion Program of ThoughtsMeta-Prompting generates the curriculum (what to learn next). RAG retrieves relevant existing skills. Reflexion handles the iterative refinement loop. Program of Thoughts grounds everything in executable code.
Three Interlocking Loops
Curriculum
"What should I learn next?" The agent examines its current capabilities and proposes the next challenge — achievable, novel, and progressive.
Skill Creation
Write code to accomplish the task. Execute it. If it fails, read the error, refine the code, and try again. Repeat until it works or give up.
Skill Library
Verified solutions are stored with descriptions and embeddings. Future tasks retrieve similar past skills as context and building blocks.
See It in Action
A Minecraft agent teaching itself progressively harder skills.
Attempt 1: "ironOreBlock is not defined" → feedback
Attempt 2: "findBlock is not a function" → feedback
Attempt 3: Uses correct API → SUCCESS. Skill added.
The Growing Skill Library
Each skill is verified code that can be retrieved, composed, and reused.
The Numbers
After 160 iterations in Minecraft. Skill accumulation is the difference maker.
Why This Works
The skill library is the breakthrough. Without it, the agent starts from zero every time — like a student who never takes notes. With the library, solved problems become tools: the agent retrieves relevant past solutions as context and building blocks, dramatically reducing the work needed for new but similar challenges.
The automatic curriculum is equally important. Rather than following a fixed sequence, the agent assesses what it can currently do and proposes the most useful next challenge. This self-directed learning means it focuses effort where it has the highest chance of making progress.
The System
Decide what to learn. Write code to do it. Test, refine, verify. Store the working solution. Retrieve and compose past solutions for harder challenges. A self-teaching agent with a growing skill library.
When to Use This
- • Open-ended exploration where accumulated skills compound over time
- • Domains with executable feedback (code runs, tests pass, game state changes)
- • Building a transferable library of reusable AI capabilities
- • Long-horizon tasks where past solutions are relevant to future problems
When to Skip This
- • One-shot tasks — no benefit from skill accumulation if you won't face similar problems
- • No executable feedback — the iterative refinement loop needs clear success/failure signals
- • Real-time applications — iterative code refinement takes time
- • Non-code domains — the pattern works best when skills can be expressed as executable code
How It Relates
Voyager adds what AutoGPT/BabyAGI lacks: skill accumulation. Where autonomous agents repeat mistakes, Voyager stores solutions and builds on them. The Cognitive Loop can invoke Voyager when it encounters a skill gap. JARVIS can expose learned skills as callable tools for other agents.
At Level 4, Self-Improving Systems extend this concept beyond skills to the system's own architecture and prompts, and World Model Agents add simulation to predict which skills to learn before committing to execution.