The Idea

AI is surprisingly good at understanding problems and mapping out solution steps. It's surprisingly bad at arithmetic. Ask it to multiply 847 × 294 and it might say 248,918... or 249,218... or something else that looks plausible but is wrong.

Program of Thoughts splits the work into what each is best at: AI reads the problem, reasons about the approach, and writes a program with meaningful variable names and clear step-by-step comments. Then a code interpreter executes the program perfectly — no arithmetic mistakes, no rounding errors, no "carry the one" failures.

Building Blocks

This composition builds on:

Let Code Do It Think Step by Step

Program of Thoughts combines step-by-step reasoning (expressed as code comments and variable names) with code execution for the computation — disentangling the reasoning from the calculation.

The Problem with Reasoning in Words

Chain-of-Thought (words only)

"$1,000 at 5% for 10 years with compound interest..."

"Year 1: $1,050. Year 2: $1,102.50. Year 3: $1,157.63..."

By year 5, small rounding errors have compounded. By year 10, the answer is noticeably wrong.

Program of Thoughts (code)

AI writes: 1000 * (1 + 0.05) ** 10

Interpreter returns: $1,628.89

One expression, perfect precision. No accumulated rounding errors.

See It in Action

Question: "A store has a 25% off sale. You have an additional 10% member discount applied after the sale price. If the original price is $240, what do you pay?"

1
AI reasons about the problem and writes code
# Original price of the item
original_price = 240

# Step 1: Apply the 25% store sale
sale_discount = 0.25
after_sale = original_price * (1 - sale_discount)

# Step 2: Apply the 10% member discount to the sale price
member_discount = 0.10
final_price = after_sale * (1 - member_discount)

# Step 3: How much did you save?
total_savings = original_price - final_price

print(f"You pay: ${final_price:.2f}")
print(f"You saved: ${total_savings:.2f}")
↓ interpreter executes
2
Code interpreter returns exact results
Output
You pay: $162.00
You saved: $78.00

Notice how the variable names tell the story: original_price, after_sale, final_price. The code is the reasoning.

Another Example: Iterative Computation

Question: "What is the sum of all square numbers from 1 to 100?"

Chain-of-thought would struggle here — manually squaring and summing 100 numbers is error-prone. Program of Thoughts handles it effortlessly:

# Sum of squares from 1 to 100
total = sum(i**2 for i in range(1, 101))
print(total) # 338,350

No human or AI could reliably do this mentally. But expressing it as code makes it trivial.

Why This Works

AI and code interpreters have complementary strengths. AI understands natural language, grasps context, and knows how to approach problems. Code interpreters execute calculations perfectly, handle iteration, and never make arithmetic errors.

The key insight of Program of Thoughts is disentangling these two skills. By writing code with meaningful variable names and comments, the AI shows its reasoning just as clearly as chain-of-thought — while getting perfect computation for free.

The Composition

AI reasons about the problem and writes code with meaningful variable names. A code interpreter executes it perfectly. You get the best of both: clear reasoning and flawless computation.

When to Use This

When to Skip This

How It Relates

Program of Thoughts is closely related to Let Code Do It (PAL), which also has AI write code for execution. The difference is emphasis: Program of Thoughts focuses on the code as a reasoning trace — the comments and variable names are as important as the computation, making the AI's thinking visible and auditable.

It pairs naturally with Self-Consistency — generate multiple programs for the same problem, execute all of them, and take the majority answer. This combination can add another 2–6% accuracy on top of Program of Thoughts alone.