DSPy

The Idea

Traditional prompt engineering is like writing assembly code: you're hand-crafting exact text strings, testing them by feel, and hoping they work. When the model changes, your prompts break. When the task changes, you start over from scratch.

DSPy treats AI like software. You declare what each step should do ("given context and a question, produce an answer"), compose steps into modules, and then a compiler automatically discovers the best prompts, examples, and configurations. You never write a prompt string. You write a program, and DSPy figures out how to talk to the model.

Building Blocks

This composition builds on:

Chain It APE

DSPy takes prompt chaining (composing multi-step pipelines) and automatic prompt optimization (APE), and wraps them in a full programming framework with a compiler that optimizes everything together.

The Three Layers

Signatures

Declare what each step does: "context, question → answer." No prompt text — just input and output descriptions. The compiler handles the wording.

Modules

Compose signatures into pipelines. Chain-of-thought, retrieval, ReAct agents — all available as building blocks you snap together like LEGO.

Compiler

Give it examples and a quality metric. It automatically finds the best prompts, selects the best few-shot examples, and optimizes the whole pipeline together.

See It in Action

Building a question-answering system that searches a knowledge base and then reasons about what it found.

Declare what you need

Signature

"Given retrieved context and a question, produce an answer in 1–2 sentences."

No prompt text written. Just a semantic description of the transformation.

↓

Compose into a module

Module structure

Retrieve: Search the knowledge base for the 3 most relevant passages
ChainOfThought: Reason about the passages to answer the question

Two building blocks snapped together. Like calling functions — not crafting prompts.

↓

Compile with examples

The compiler...

Runs your module on training examples → finds the runs that score highest → extracts the best prompts and few-shot examples → bakes them into the module.

Result: 25%+ improvement over standard few-shot, with zero manual prompt tuning.

The Paradigm Shift

Traditional Prompting

Write prompt strings by hand
Test by eye — "does this look right?"
Break when models change
Each pipeline is a one-off
Optimization = intuition + trial-and-error

Declare signatures, never write prompts
Test with metrics — measured accuracy
Portable across models (recompile)
Modules are reusable building blocks
Optimization = automated compiler search

Why This Works

Hand-writing prompts conflates two things: what you want (the semantic goal) and how to get it (the exact wording). DSPy separates them. You declare the what; the compiler discovers the how. This means when you switch models, you just recompile — the compiler finds new optimal prompts for the new model automatically.

The compiler also has a major advantage over manual tuning: it can search systematically across thousands of prompt/example combinations, finding configurations a human would never try. Small compiled models can even match the performance of expert-prompted large models.

The Composition

Declare what each step should do. Compose steps into modules. Let a compiler find the best prompts and examples automatically. Programming for AI, not prompt-crafting for AI.

When to Use This

• Building multi-step AI pipelines for production use
• When you want systematic optimization instead of prompt trial-and-error
• Portability matters — you might switch models or providers
• You have evaluation data and metrics to guide compilation

When to Skip This

• Simple one-off queries — the framework overhead isn't justified for a single question
• Need exact prompt control — if specific wording matters for compliance or branding, hand-crafted prompts give more control
• Quick prototyping — DSPy has a learning curve; for rapid experiments, direct prompting is faster to start
• No evaluation data — without labeled examples and metrics, the compiler has nothing to optimize against

How It Relates

DSPy is the full realization of what APE started: automated prompt optimization. Where APE optimizes a single instruction, DSPy optimizes entire multi-step pipelines — prompts, examples, and module configurations all together.

It also provides a different approach from frameworks like LangChain. LangChain gives you tools to chain prompt calls together, but you still write the prompts. DSPy replaces prompt-writing with declaration and compilation — a higher level of abstraction that trades control for systematic optimization.