The Idea
Here's a surprising finding: when AI gets a multi-step problem wrong and you simply ask "try again," performance actually gets worse with each retry. Vague feedback like "that's wrong, fix it" causes the model to second-guess correct steps while failing to fix the actual error.
Recursive Chain-of-Feedback takes a targeted approach: find the exact step that went wrong, extract it as its own simpler problem, solve that smaller problem (which is much easier to get right), then plug the corrected answer back into the full solution. If the sub-problem is still too hard, decompose it further. It's self-correction that actually works.
Building Blocks
This composition builds on:
Check Your Work Loop Until DoneR-CoF combines self-evaluation (finding errors) with iterative improvement, but adds a crucial element: recursive decomposition. Instead of retrying the whole problem, it isolates and fixes the specific broken piece.
Why "Try Again" Doesn't Work
Naive Retry
Attempt 1: 70% correct
"Try again" → Attempt 2: 65% correct
"Try again" → Attempt 3: 58% correct
Performance degrades because the model overthinks correct steps while missing the real error.
Recursive Chain-of-Feedback
Attempt 1: 70% correct
Find error → Fix step → 85% correct
Find error → Fix step → 92% correct
Performance improves because each correction is targeted and precise.
See It in Action
Question: "A store sells apples for $2 each. John buys 5 apples and pays with a $20 bill. How much change does he get?"
Step 2: Change = $20 − $12 = $8
This simpler question is much easier to get right than retrying the whole problem.
Step 2: Change = $20 − $10 = $10
The Recursive Part
What if the sub-problem is also too hard? The technique applies itself recursively. Imagine a complex physics problem where the error is in a calculus step, and the calculus sub-problem has an algebra error. R-CoF would:
- 1. Identify the calculus step as wrong
- 2. Isolate it as a calculus sub-problem
- 3. Attempt the calculus — find an algebra error within it
- 4. Isolate the algebra as an even simpler sub-problem
- 5. Solve the algebra correctly (simple enough now)
- 6. Rebuild the calculus, then rebuild the physics solution
Each level of recursion makes the problem simpler, until it's easy enough to solve correctly.
Why This Works
The insight is that AI errors are usually local, not global. When a 10-step solution goes wrong, typically one or two specific steps contain mistakes while the rest are fine. Retrying the whole thing risks breaking the good steps. Targeted correction fixes only what's broken.
Making the sub-problem simpler is equally important. AI is much more reliable on easy, focused questions than on complex multi-step ones. By extracting "What is 5 × 2?" from a larger problem, you're playing to the model's strengths.
The Composition
Find the exact step that's wrong. Extract it as a simpler problem. Solve it. Plug the fix back in. If the sub-problem is still too hard, go deeper. Surgical precision instead of blind retrying.
When to Use This
- • Multi-step reasoning problems where errors are localized to specific steps
- • Math and logic tasks where individual steps can be verified
- • When you need interpretable corrections — showing what was wrong and how it was fixed
- • Zero-shot self-correction without needing example data
When to Skip This
- • Single-step problems — there's nothing to decompose if the problem is already atomic
- • Creative tasks — when there's no objectively "correct" answer, error identification doesn't apply
- • Fundamentally wrong approach — if the entire reasoning strategy is off, fixing individual steps won't help
- • Real-time applications — recursive correction adds multiple LLM calls and latency
How It Relates
R-CoF is a more structured approach to the same goal as Reflexion: improving through self-correction. Reflexion works across entire episodes (attempt, reflect, retry), while R-CoF works within a single solution (find the broken step, fix just that piece). They can even be combined: Reflexion for episode-level learning, R-CoF for within-episode step-level correction.
It also relates to Least-to-Most prompting, which decomposes problems into easier sub-problems from the start. R-CoF uses decomposition after failure — only breaking things down when and where errors actually occur, rather than preemptively.