The Idea
AI is surprisingly good at understanding problems and mapping out solution steps. It's surprisingly bad at arithmetic. Ask it to multiply 847 × 294 and it might say 248,918... or 249,218... or something else that looks plausible but is wrong.
Program of Thoughts splits the work into what each is best at: AI reads the problem, reasons about the approach, and writes a program with meaningful variable names and clear step-by-step comments. Then a code interpreter executes the program perfectly — no arithmetic mistakes, no rounding errors, no "carry the one" failures.
Building Blocks
This composition builds on:
Let Code Do It Think Step by StepProgram of Thoughts combines step-by-step reasoning (expressed as code comments and variable names) with code execution for the computation — disentangling the reasoning from the calculation.
The Problem with Reasoning in Words
Chain-of-Thought (words only)
"$1,000 at 5% for 10 years with compound interest..."
"Year 1: $1,050. Year 2: $1,102.50. Year 3: $1,157.63..."
By year 5, small rounding errors have compounded. By year 10, the answer is noticeably wrong.
Program of Thoughts (code)
AI writes: 1000 * (1 + 0.05) ** 10
Interpreter returns: $1,628.89
One expression, perfect precision. No accumulated rounding errors.
See It in Action
Question: "A store has a 25% off sale. You have an additional 10% member discount applied after the sale price. If the original price is $240, what do you pay?"
original_price = 240
# Step 1: Apply the 25% store sale
sale_discount = 0.25
after_sale = original_price * (1 - sale_discount)
# Step 2: Apply the 10% member discount to the sale price
member_discount = 0.10
final_price = after_sale * (1 - member_discount)
# Step 3: How much did you save?
total_savings = original_price - final_price
print(f"You pay: ${final_price:.2f}")
print(f"You saved: ${total_savings:.2f}")
You saved: $78.00
Notice how the variable names tell the story: original_price, after_sale, final_price. The code is the reasoning.
Another Example: Iterative Computation
Question: "What is the sum of all square numbers from 1 to 100?"
Chain-of-thought would struggle here — manually squaring and summing 100 numbers is error-prone. Program of Thoughts handles it effortlessly:
total = sum(i**2 for i in range(1, 101))
print(total) # 338,350
No human or AI could reliably do this mentally. But expressing it as code makes it trivial.
Why This Works
AI and code interpreters have complementary strengths. AI understands natural language, grasps context, and knows how to approach problems. Code interpreters execute calculations perfectly, handle iteration, and never make arithmetic errors.
The key insight of Program of Thoughts is disentangling these two skills. By writing code with meaningful variable names and comments, the AI shows its reasoning just as clearly as chain-of-thought — while getting perfect computation for free.
The Composition
AI reasons about the problem and writes code with meaningful variable names. A code interpreter executes it perfectly. You get the best of both: clear reasoning and flawless computation.
When to Use This
- • Math word problems requiring precise arithmetic (not approximate)
- • Financial calculations — compound interest, ROI, tax computations
- • Iterative or repetitive computations (sums, series, simulations)
- • Data analysis tasks involving numerical transformations
When to Skip This
- • No math involved — if the task is purely about language, reasoning, or creativity, there's nothing for code to compute
- • No code execution available — some environments don't allow running generated code
- • Security-sensitive contexts — executing AI-generated code requires sandboxing and safety measures
- • Simple arithmetic — for "what's 5 + 3?", the overhead of generating and running code isn't worth it
How It Relates
Program of Thoughts is closely related to Let Code Do It (PAL), which also has AI write code for execution. The difference is emphasis: Program of Thoughts focuses on the code as a reasoning trace — the comments and variable names are as important as the computation, making the AI's thinking visible and auditable.
It pairs naturally with Self-Consistency — generate multiple programs for the same problem, execute all of them, and take the majority answer. This combination can add another 2–6% accuracy on top of Program of Thoughts alone.