World Model Agents | Just Add AI

Conceptual Framework — This page describes a theoretical architecture synthesized from published research, not a single proven technique. The building blocks are real; the overall design is a blueprint for how they could fit together.

Think Before You Move

A chess grandmaster doesn't move pieces randomly and see what happens. They mentally simulate sequences of moves, imagining how the board would look 5 or 10 moves ahead, evaluating which future looks best, and only then touching a piece.

World Model Agents work the same way. They build an internal representation of how the world works — what entities exist, how they relate, what happens when you take actions — and use that model to "imagine" the consequences of different choices before actually committing. The agent can even "dream," practicing in imagination to learn without any real-world risk.

Three Parts of a World Model

👁

Encoder

Converts what the agent sees into a structured internal state: entities, relationships, and properties.

Observation → Internal State

⚙

Dynamics Model

Predicts what happens next given the current state and an action. The heart of imagination.

State + Action → Next State

🏆

Reward Model

Evaluates how good a state is relative to the goal. Enables comparison of imagined futures.

State + Goal → Score (0–1)

Together, these three components let the agent ask: "If I do this, what happens, and is that outcome good?" — all without actually doing anything in the real world.

Planning Through Imagination

For each decision, the agent imagines multiple possible actions and their consequences, then picks the best path:

Goal: Negotiate a Partnership Deal

Path A: Lead with Shared Benefits

Imagine: Open with mutual wins → Partner engages positively → Counter-offer is reasonable → Agreement in 2 rounds → Both sides satisfied

Predicted value: 0.87

Path B: Lead with Our Demands

Imagine: Open with our needs → Partner gets defensive → Negotiations stall → Eventually settle but with resentment → Partnership is fragile

Predicted value: 0.52

Path C: Aggressive Ultimatum

Imagine: Present take-it-or-leave-it → Partner walks away → Reputation damage → Future partnerships harder

Predicted value: 0.15

Decision: Path A — all simulated in imagination, zero real-world risk.

Learning by Dreaming

Practice Without Consequences

The agent doesn't have to wait for real tasks to improve. During idle time, it can "dream" — generating hypothetical scenarios, planning and simulating full trajectories, and analyzing what it learns:

Step 1: Start from the current world state and generate a random practice goal.
Step 2: Plan and simulate the full execution — up to 20 imagined steps.
Step 3: Analyze the imagined trajectory: What patterns emerged? What could go wrong? Any useful skills to extract?
Step 4: Update the dynamics model with insights from the dream.

This is how the agent improves its world understanding without consuming real resources or taking real risks. It's like a pilot using a flight simulator — the experience is virtual, but the learning is real.

Getting Smarter from Mistakes

Every time the agent acts in the real world, it compares what actually happened with what it predicted would happen. When those differ significantly, it learns a new rule:

Learned Dynamics Rules

Prediction Error

Predicted the client would accept the proposal; client asked for more time instead.

Extracted Rule

IF the proposal involves a budget increase AND the client hasn't been consulted on budget expectations THEN the likely response is "needs review" rather than acceptance.

Applied Going Forward

Next time a budget-increasing proposal is considered, the dynamics model factors in this rule — predicting the need for a pre-consultation step, leading to better planning.

In Practice: Investment Decision

Encode the Current State

The encoder processes market data, company financials, and competitor actions into a structured state: 12 entities (companies), 28 relationships (partnerships, competition), and current trend signals.

↓

Imagine Multiple Futures

LATS runs tree search entirely within the world model. Each branch simulates 5 steps ahead using the dynamics model. Generative Agents predict how competitors would respond to each investment move.

Best Imagined Path

Invest in Company X → Competitor Y responds with acquisition of Company Z → Market shifts favor our position → Q3 returns: +12%. Score: 0.84 with 0.71 confidence.

↓

Execute and Learn

Agent proceeds with the best-scored action. After real outcomes arrive, it compares prediction vs. reality. Competitor Y didn't acquire Z but launched a new product instead — the dynamics model learns a new rule about competitor behavior patterns.

What Makes This Different

Other architectures plan by actually trying things or by reasoning abstractly. World Model Agents plan by simulating — testing actions in a mental model that predicts consequences without real-world cost.

This makes them uniquely suited for high-stakes environments where mistakes are expensive. A bad investment, a failed negotiation, a robot collision — all can be "tested" safely in imagination first.

The dreaming capability is especially powerful. The agent can practice thousands of scenarios overnight, building expertise from imagined experience. And every real-world action improves the model, creating a virtuous cycle: better predictions lead to better planning, which leads to better outcomes, which produce better training data.

Component Systems

The world model integrates with these Level 3 systems for planning and execution:

Generative Agents (Entity Simulation) LATS (Tree Search Planning) Voyager (Skill Learning) Cognitive Loop (Reasoning)

The Core Idea

Don't learn by trial and error when mistakes are costly. Build an internal model of how the world works, test decisions in imagination, and only act on the best-scoring future you can envision.

When to Use This

• Strategic decisions where the dynamics are at least partially predictable — business, games, logistics, negotiations
• Real mistakes are costly — financial, safety, or reputational risk means you want to "test" actions mentally first
• Long-horizon planning in large problem spaces where exhaustive real exploration is impractical
• Multi-agent scenarios where modeling what others will do is crucial to success

When to Skip This

• Highly unpredictable or chaotic environments — world model predictions will be unreliable if there are no repeating patterns
• Real-time latency requirements — imagination adds computational overhead to every decision
• Real experience is cheap and safe — direct trial-and-error may be more efficient than building a world model
• Simple, one-step decisions that don't benefit from multi-step simulation

How It Relates

• Embodied Cognitive Architecture adds physical grounding to world models — connecting imagined plans to real-world sensors and actuators
• Hierarchical Agent Architecture shares multi-layer planning but without internal simulation — it plans through abstraction levels, not through imagined futures
• LATS (Level 3) does tree search through real actions; World Model Agents let LATS search through imagined actions instead — cheaper, faster, safer