One million steps

created: Sun, 23 Nov 2025 18:25:10 GMT, modified: Sun, 23 Nov 2025 18:49:21 GMT

Getting an AI agent to make a million sequential decisions without errors sounds impossible. The context window fills up, details get lost, and small mistakes compound into catastrophic failures.

There are two usual approaches, each with different problems. You can keep one agent running with the same context, pruning old information to fit the context window. This keeps continuity but you're guessing what's safe to discard. Or you can decompose the problem into specialized sub-agents, each with focused context. This requires figuring out how to split the work and coordinate between agents.

Solving a Million-Step LLM Task with Zero Errors applies agent decomposition to problems that can be broken into identical steps. The researchers picked Tower of Hanoi as their test problem - the 20-disk variant requires 1,048,575 moves to solve optimally. It's simple enough to verify each step, but unforgiving enough that one wrong move ruins everything.

The approach

The agent never sees the big picture. It just solves one identical problem over and over. Given the current state, determine the next move. That's it.

But a single run fails too often. They run the same step multiple times and vote on the answer. When most runs agree on the next move, that's probably correct. Before voting, they filter out obviously broken outputs - agents that produce long rambling responses or fail to follow the format are usually wrong.

The math works because errors are random while correct reasoning converges. Running the same deterministic problem with temperature separates hallucinations from valid logic.

The tradeoff is compute cost. Running 5-10 attempts per step gets expensive. But for critical tasks where errors are costly, redundancy beats failure.

This pattern might work for any task with verifiable intermediate states:

Code generation with type checking
Mathematical proofs
Game playing
Formal verification

The approach fits naturally with atomic agents and small models focused on specific repeatable tasks.