Conceptual Framework — This page describes a theoretical architecture synthesized from published research, not a single proven technique. The building blocks are real; the overall design is a blueprint for how they could fit together.

From Words to Actions

You say to a robot: "Bring me the blue book from the shelf." Simple for a human. Enormously complex for AI. The robot needs to understand your intent, plan a sequence of physical actions, find the specific blue book among dozens of objects, check whether it can actually reach and grasp it safely, navigate to you, and hand it over — all while avoiding obstacles and not breaking anything.

An Embodied Cognitive Architecture solves this by connecting LLM-based reasoning with physical sensors and actuators through four specialized layers, each operating at the speed appropriate to its function — from slow deliberation to 100Hz real-time control.

The Four Layers

Cognitive Layer
Seconds
Understands language commands, reasons about goals, and generates high-level plans. "Go to the shelf, find the blue book, pick it up, bring it to the user." Monitors progress and replans when things go wrong.
Cognitive Loop + World Model + LATS Planning
↓ Symbolic Actions ↑ Status Updates
Grounding Layer
Milliseconds
The critical bridge. Translates "blue book" into a specific detected object at coordinates [x,y,z]. Checks: Can I reach it? Do I have the right skill? Is it safe? This is what most AI architectures are missing.
Language Grounding + Affordance Detection + Safety Checker
↓ Motor Commands ↑ Sensor Data
Control Layer
100 Hz
Real-time physical execution. Plans collision-free trajectories, executes motor skills at high frequency, and reacts instantly to obstacles, unexpected forces, or collisions.
Motion Planning + Reactive Control + Skill Execution
↓ Electrical Signals ↑ Raw Sensor Data
Physical Layer
Hardware
The actual hardware: cameras for seeing, LiDAR for depth, robotic arms for grasping, wheels for moving, force sensors for contact detection. The real world, in all its messy complexity.
Cameras + LiDAR + Arms + Wheels + Force Sensors

The Grounding Layer: The Key Innovation

Most AI architectures stop at abstract reasoning. The grounding layer is what makes physical action possible — translating between the world of language and the world of physics:

Translating "Pick Up the Blue Book"

1

Language Grounding

Match "blue book" to a specific detected object in the visual scene. Consider appearance, spatial relations, and context. Result: the hardcover at position [0.8, 1.2, 0.3] with 0.94 confidence.

2

Affordance Detection

What can the robot physically do with this object? It's graspable (top or side grip), liftable (estimated weight within limits), and slideable. Not pourable or openable.

3

Feasibility Check

Is the object reachable? Yes, within workspace. Does the robot know how to grasp? Yes, "top_grasp" skill available. Is it safe? No humans nearby, no fragile items at risk, clearance sufficient.

4

Skill Mapping

Map "pick up" + "graspable" to the "top_grasp" motor skill with parameters: target position [0.8, 1.2, 0.3], grip force 25N, approach angle 90°. Ready for the control layer.

If any step fails — object not found, not reachable, not safe — the grounding layer sends a detailed explanation back to the cognitive layer, which can replan. "Blue book is behind the red binder; try moving the binder first."

In Practice: "Bring Me the Blue Book"

1
Cognitive: Understand and Plan

Cognitive Loop parses the intent: fetch an object. LATS generates the optimal plan using available skills and the current world model state.

Plan
Navigate to shelf → Locate blue book → Grasp book → Navigate to user → Deliver book. Contingency: if book is blocked, clear obstruction first.
2
Grounding: Translate to Physical Actions

For "locate blue book": language grounding matches the description to an object on the second shelf. Affordance detection confirms it's graspable. Feasibility check: reachable, skill available, safe.

3
Control: Execute with Safety

Motion planning generates a collision-free trajectory. The arm moves at 100Hz with reactive control: obstacle avoidance adjusts the path when a chair edge is detected, force compliance limits grip to 25N to avoid damage.

Real-Time Adjustment
Midway through navigation, a person walks into the path. Reactive control immediately slows to zero. Once the path clears, execution resumes. Total pause: 3 seconds. The cognitive layer didn't need to know — the control layer handled it.
4
Cognitive: Monitor and Confirm

After each step, the cognitive layer checks: Is the book now in the gripper? Did I reach the user? World model updated with the book's new location. Plan marked complete.

Safety at Every Layer

Physical actions have real consequences. Safety isn't a single checkpoint — it's enforced at every layer:

Defense in Depth

Cognitive
Plans avoid unsafe goals entirely. Won't plan to move heavy objects near people or operate near hazardous materials.
Grounding
Checks every action for human safety, property damage, robot damage, and collision risk before execution begins.
Control
Real-time obstacle avoidance (0.1m margin), force compliance (50N limit), and speed limiting (1.0 m/s cap) at 100Hz.
Physical
Hardware emergency stops, torque limiters, and bumper contact sensors as the absolute last line of defense.

What Makes This Different

Every other meta-architecture operates in the purely digital world. This one crosses the symbol-grounding gap — connecting abstract concepts like "blue book" with physical coordinates, graspability assessments, and motor trajectories.

The dedicated grounding layer is the critical innovation. Without it, you either have abstract planners that can't physically execute, or reactive controllers that can't reason about goals. The grounding layer bridges both worlds, checking feasibility before any action begins.

And the multi-timescale design means the cognitive layer can think slowly and carefully (seconds) while the reactive controller ensures safety at 100Hz. The robot can deliberate about strategy while still dodging an obstacle in 10 milliseconds.

Component Systems

The cognitive layer integrates these Level 3 systems with physical world interfaces:

Cognitive Loop (Reasoning) World Model (State Tracking) LATS (Planning) Voyager (Skill Learning)

The Core Idea

Bridge the gap between language and physics. A dedicated grounding layer translates abstract reasoning into feasible physical actions, with safety checks at every layer from planning to 100Hz reactive control.

When to Use This

When to Skip This

How It Relates