The Idea
You can't fine-tune GPT-4 or Claude — they're behind an API. But you can train a tiny model to write better prompts for them. That's the insight behind Directional Stimulus Prompting: a small, trainable "policy model" learns to generate instance-specific hints that guide the large model's response.
Instead of one-size-fits-all instructions, each input gets its own set of tailored keywords, key points, or guiding phrases. The large model sees these hints alongside the input and produces a more focused, higher-quality response. With as few as 80 labeled examples, this approach can improve ChatGPT's performance by over 40% on dialogue tasks.
Building Blocks
This composition builds on:
Show by Example Give It a RoleDirectional Stimulus automates what you'd do manually with good prompting: providing relevant context and guidance. Instead of hand-crafting hints, a small model generates them specifically for each input.
The Two-Model Flow
Generate Hints
Trained on your task. Produces keywords and key points specific to each input. Cheap and fast to run.
Generate Response
Receives the original input plus the hints. Produces a higher-quality, more focused response than it would alone.
See It in Action
Task: Summarize a long article about climate change.
Key points to cover: rising temperatures, policy responses, economic impact, scientific consensus, renewable energy.
[article text...]
Without hints, the model might focus on just one angle. With hints, it covers what matters.
Another Example: Dialogue
The hints told the model exactly which missing information to ask about — instead of giving a generic response.
Why This Works
Large models are capable but easily distracted. A long article has many potential angles; a dialogue could go many directions. Hints act like a GPS: they don't drive the car, but they point it in the right direction. The large model still does the heavy lifting of generating fluent, coherent text.
Training the hint-generating model is cheap because it's small. And because hints are natural language (not learned embeddings), you can inspect and understand exactly what guidance the model is giving. It's a transparent steering mechanism.
The Composition
Train a small model to generate targeted hints for each input. Feed those hints to the large model alongside the task. The small model steers; the large model drives. Better results without fine-tuning the big model.
When to Use This
- • Working with API-based models you can't fine-tune (GPT-4, Claude)
- • Tasks with clear quality metrics to train the policy model against
- • You have some labeled examples — even as few as 80 can work
- • Domain adaptation where the large model needs task-specific guidance
When to Skip This
- • Can fine-tune directly — if you can train the large model itself, that's more effective
- • No labeled data at all — you need at least some examples to train the policy model
- • No clear quality signal — without a way to measure hint quality, training the policy is guesswork
- • Extreme latency constraints — running two models adds some overhead
How It Relates
Directional Stimulus is related to APE (Automatic Prompt Engineer) but with a key difference: APE finds one best prompt for all inputs, while DSP generates instance-specific hints tailored to each input. It's the difference between a universal GPS route versus turn-by-turn navigation that adapts to where you are.
It also connects to DSPy's compilation approach — both optimize what gets sent to the model. DSPy optimizes the prompt template; Directional Stimulus optimizes the per-input guidance. In production systems, you might use both together.