Think Before You Move
A chess grandmaster doesn't move pieces randomly and see what happens. They mentally simulate sequences of moves, imagining how the board would look 5 or 10 moves ahead, evaluating which future looks best, and only then touching a piece.
World Model Agents work the same way. They build an internal representation of how the world works — what entities exist, how they relate, what happens when you take actions — and use that model to "imagine" the consequences of different choices before actually committing. The agent can even "dream," practicing in imagination to learn without any real-world risk.
Three Parts of a World Model
Encoder
Converts what the agent sees into a structured internal state: entities, relationships, and properties.
Dynamics Model
Predicts what happens next given the current state and an action. The heart of imagination.
Reward Model
Evaluates how good a state is relative to the goal. Enables comparison of imagined futures.
Together, these three components let the agent ask: "If I do this, what happens, and is that outcome good?" — all without actually doing anything in the real world.
Planning Through Imagination
For each decision, the agent imagines multiple possible actions and their consequences, then picks the best path:
Goal: Negotiate a Partnership Deal
Decision: Path A — all simulated in imagination, zero real-world risk.
Learning by Dreaming
Practice Without Consequences
The agent doesn't have to wait for real tasks to improve. During idle time, it can "dream" — generating hypothetical scenarios, planning and simulating full trajectories, and analyzing what it learns:
Step 1: Start from the current world state and generate a random practice goal.
Step 2: Plan and simulate the full execution — up to 20 imagined steps.
Step 3: Analyze the imagined trajectory: What patterns emerged? What could go wrong? Any useful skills to extract?
Step 4: Update the dynamics model with insights from the dream.
This is how the agent improves its world understanding without consuming real resources or taking real risks. It's like a pilot using a flight simulator — the experience is virtual, but the learning is real.
Getting Smarter from Mistakes
Every time the agent acts in the real world, it compares what actually happened with what it predicted would happen. When those differ significantly, it learns a new rule:
Learned Dynamics Rules
In Practice: Investment Decision
The encoder processes market data, company financials, and competitor actions into a structured state: 12 entities (companies), 28 relationships (partnerships, competition), and current trend signals.
LATS runs tree search entirely within the world model. Each branch simulates 5 steps ahead using the dynamics model. Generative Agents predict how competitors would respond to each investment move.
Agent proceeds with the best-scored action. After real outcomes arrive, it compares prediction vs. reality. Competitor Y didn't acquire Z but launched a new product instead — the dynamics model learns a new rule about competitor behavior patterns.
What Makes This Different
Other architectures plan by actually trying things or by reasoning abstractly. World Model Agents plan by simulating — testing actions in a mental model that predicts consequences without real-world cost.
This makes them uniquely suited for high-stakes environments where mistakes are expensive. A bad investment, a failed negotiation, a robot collision — all can be "tested" safely in imagination first.
The dreaming capability is especially powerful. The agent can practice thousands of scenarios overnight, building expertise from imagined experience. And every real-world action improves the model, creating a virtuous cycle: better predictions lead to better planning, which leads to better outcomes, which produce better training data.
Component Systems
The world model integrates with these Level 3 systems for planning and execution:
Generative Agents (Entity Simulation) LATS (Tree Search Planning) Voyager (Skill Learning) Cognitive Loop (Reasoning)The Core Idea
Don't learn by trial and error when mistakes are costly. Build an internal model of how the world works, test decisions in imagination, and only act on the best-scoring future you can envision.
When to Use This
- • Strategic decisions where the dynamics are at least partially predictable — business, games, logistics, negotiations
- • Real mistakes are costly — financial, safety, or reputational risk means you want to "test" actions mentally first
- • Long-horizon planning in large problem spaces where exhaustive real exploration is impractical
- • Multi-agent scenarios where modeling what others will do is crucial to success
When to Skip This
- • Highly unpredictable or chaotic environments — world model predictions will be unreliable if there are no repeating patterns
- • Real-time latency requirements — imagination adds computational overhead to every decision
- • Real experience is cheap and safe — direct trial-and-error may be more efficient than building a world model
- • Simple, one-step decisions that don't benefit from multi-step simulation
How It Relates
- • Embodied Cognitive Architecture adds physical grounding to world models — connecting imagined plans to real-world sensors and actuators
- • Hierarchical Agent Architecture shares multi-layer planning but without internal simulation — it plans through abstraction levels, not through imagined futures
- • LATS (Level 3) does tree search through real actions; World Model Agents let LATS search through imagined actions instead — cheaper, faster, safer