Directional Stimulus Prompting

The Idea

You can't fine-tune GPT-4 or Claude — they're behind an API. But you can train a tiny model to write better prompts for them. That's the insight behind Directional Stimulus Prompting: a small, trainable "policy model" learns to generate instance-specific hints that guide the large model's response.

Instead of one-size-fits-all instructions, each input gets its own set of tailored keywords, key points, or guiding phrases. The large model sees these hints alongside the input and produces a more focused, higher-quality response. With as few as 80 labeled examples, this approach can improve ChatGPT's performance by over 40% on dialogue tasks.

Building Blocks

This composition builds on:

Show by Example Give It a Role

Directional Stimulus automates what you'd do manually with good prompting: providing relevant context and guidance. Instead of hand-crafting hints, a small model generates them specifically for each input.

The Two-Model Flow

Small Policy Model

Generate Hints

Trained on your task. Produces keywords and key points specific to each input. Cheap and fast to run.

→

Large LLM

Generate Response

Receives the original input plus the hints. Produces a higher-quality, more focused response than it would alone.

See It in Action

Task: Summarize a long article about climate change.

Small Policy Model

Generate targeted hints for this article

Hints generated

rising temperatures policy responses economic impact scientific consensus renewable energy

↓ hints injected into prompt

Large LLM

Summarize with hints as guidance

The prompt the large model sees

Summarize this article.
Key points to cover: rising temperatures, policy responses, economic impact, scientific consensus, renewable energy.
[article text...]

Result

A focused, comprehensive summary that hits all the key themes — not just whatever the model happened to latch onto first.

Without hints, the model might focus on just one angle. With hints, it covers what matters.

Another Example: Dialogue

User says

"I want to book a hotel in Paris."

Policy model hints

location=Paris dates=? rooms=? budget=?

LLM response (guided by hints)

"Great, Paris is a wonderful choice! To help you find the perfect hotel, could you tell me your travel dates, how many rooms you need, and if you have a budget in mind?"

The hints told the model exactly which missing information to ask about — instead of giving a generic response.

Why This Works

Large models are capable but easily distracted. A long article has many potential angles; a dialogue could go many directions. Hints act like a GPS: they don't drive the car, but they point it in the right direction. The large model still does the heavy lifting of generating fluent, coherent text.

Training the hint-generating model is cheap because it's small. And because hints are natural language (not learned embeddings), you can inspect and understand exactly what guidance the model is giving. It's a transparent steering mechanism.

The Composition

Train a small model to generate targeted hints for each input. Feed those hints to the large model alongside the task. The small model steers; the large model drives. Better results without fine-tuning the big model.

When to Use This

• Working with API-based models you can't fine-tune (GPT-4, Claude)
• Tasks with clear quality metrics to train the policy model against
• You have some labeled examples — even as few as 80 can work
• Domain adaptation where the large model needs task-specific guidance

When to Skip This

• Can fine-tune directly — if you can train the large model itself, that's more effective
• No labeled data at all — you need at least some examples to train the policy model
• No clear quality signal — without a way to measure hint quality, training the policy is guesswork
• Extreme latency constraints — running two models adds some overhead

How It Relates

Directional Stimulus is related to APE (Automatic Prompt Engineer) but with a key difference: APE finds one best prompt for all inputs, while DSP generates instance-specific hints tailored to each input. It's the difference between a universal GPS route versus turn-by-turn navigation that adapts to where you are.

It also connects to DSPy's compilation approach — both optimize what gets sent to the model. DSPy optimizes the prompt template; Directional Stimulus optimizes the per-input guidance. In production systems, you might use both together.