The Idea

You can't fine-tune GPT-4 or Claude — they're behind an API. But you can train a tiny model to write better prompts for them. That's the insight behind Directional Stimulus Prompting: a small, trainable "policy model" learns to generate instance-specific hints that guide the large model's response.

Instead of one-size-fits-all instructions, each input gets its own set of tailored keywords, key points, or guiding phrases. The large model sees these hints alongside the input and produces a more focused, higher-quality response. With as few as 80 labeled examples, this approach can improve ChatGPT's performance by over 40% on dialogue tasks.

Building Blocks

This composition builds on:

Show by Example Give It a Role

Directional Stimulus automates what you'd do manually with good prompting: providing relevant context and guidance. Instead of hand-crafting hints, a small model generates them specifically for each input.

The Two-Model Flow

Small Policy Model

Generate Hints

Trained on your task. Produces keywords and key points specific to each input. Cheap and fast to run.

Large LLM

Generate Response

Receives the original input plus the hints. Produces a higher-quality, more focused response than it would alone.

See It in Action

Task: Summarize a long article about climate change.

Small Policy Model
1
Generate targeted hints for this article
Hints generated
rising temperatures policy responses economic impact scientific consensus renewable energy
↓ hints injected into prompt
Large LLM
2
Summarize with hints as guidance
The prompt the large model sees
Summarize this article.
Key points to cover: rising temperatures, policy responses, economic impact, scientific consensus, renewable energy.
[article text...]
Result
A focused, comprehensive summary that hits all the key themes — not just whatever the model happened to latch onto first.

Without hints, the model might focus on just one angle. With hints, it covers what matters.

Another Example: Dialogue

User says
"I want to book a hotel in Paris."
Policy model hints
location=Paris dates=? rooms=? budget=?
LLM response (guided by hints)
"Great, Paris is a wonderful choice! To help you find the perfect hotel, could you tell me your travel dates, how many rooms you need, and if you have a budget in mind?"

The hints told the model exactly which missing information to ask about — instead of giving a generic response.

Why This Works

Large models are capable but easily distracted. A long article has many potential angles; a dialogue could go many directions. Hints act like a GPS: they don't drive the car, but they point it in the right direction. The large model still does the heavy lifting of generating fluent, coherent text.

Training the hint-generating model is cheap because it's small. And because hints are natural language (not learned embeddings), you can inspect and understand exactly what guidance the model is giving. It's a transparent steering mechanism.

The Composition

Train a small model to generate targeted hints for each input. Feed those hints to the large model alongside the task. The small model steers; the large model drives. Better results without fine-tuning the big model.

When to Use This

When to Skip This

How It Relates

Directional Stimulus is related to APE (Automatic Prompt Engineer) but with a key difference: APE finds one best prompt for all inputs, while DSP generates instance-specific hints tailored to each input. It's the difference between a universal GPS route versus turn-by-turn navigation that adapts to where you are.

It also connects to DSPy's compilation approach — both optimize what gets sent to the model. DSPy optimizes the prompt template; Directional Stimulus optimizes the per-input guidance. In production systems, you might use both together.