How to make Claude Code self-correct
Claude Code does one pass and stops. To make it self-correct โ retry until the tests actually pass โ you need a verifiable finish line and a back-edge that feeds each failure into the next attempt. That is exactly what a .loop file is.
The problem: agents do one pass and stop
Ask Claude Code to "fix the failing checkout test" and it edits some code, runs the suite once (maybe), and replies "Done โ fixed the rounding." Sometimes it's right. Often the test is still red, or a different test broke, or it never ran the suite at all. "Done" is whatever the model said, not a command that passed.
The fix isn't a better prompt. A prompt fires once and trusts the model's word. What you want is a control loop: the agent acts, a real check grades the result, and on red the agent tries again โ with the failure in hand โ until the check is green or a ceiling stops it. That control loop is the artifact you edit, not the prose.
What "self-correcting" actually means
A self-correcting run is four repeating steps wrapped around a verdict the model can't author:
- plan โ read the scoped context, decide the smallest change toward the goal.
- act โ make the change.
- observe โ run the done when check and read pass/fail.
- reflect โ on red, diagnose why it failed and feed that into the next plan.
The load-bearing part is observe. At that step the runtime โ plain deterministic code, not the LLM โ spawns your check as a real OS process and reads the exit code from the operating system. Exit 0 = pass; anything else = fail. The model does the work; the OS grades it. That's why "done" can't be faked: the verdict never comes out of the model's mouth.
The other load-bearing part is reflect, the back-edge. Without it, a retry is just the same blind attempt again. With it, cycle two reads what broke in cycle one. That is the difference between a loop that thrashes and one that converges.
A concrete walkthrough
Say the checkout tax test is red. Here is the whole self-correcting loop โ the finish line at the top, the safety net at the bottom:
loop "fix the checkout tax test":
goal: the checkout tax test passes with no regressions
done when the test "checkout.spec.ts::tax" passes
look at: src/checkout, and the last failure
allow edits automatically, but ask me before pushes
each cycle: plan, then act, then observe
when it fails: reflect on which layer broke, then plan again
after 6 tries: stop and warn "tax fix thrashing"
Every cycle runs the named test. Red โ reflect writes a one-line diagnosis ("rounding happens before the discount, not after") โ that becomes context for the next plan. The loop stops only when the process your machine ran returns 0. It works on a branch and, with no git: block, never pushes to main.
Two lines earn their keep. look at: ends with and the last failure โ reflect's diagnosis only reaches the next plan if you pass it forward. And after 6 tries is the floor: a back-edge with no ceiling is a token bonfire. Never write one without the other.
The check doesn't have to be a test. Any command that exits non-zero on failure works, and you can list several โ all must pass:
loop "harden the auth middleware":
goal: rate limiting works and no high-severity findings remain
done when "pnpm test src/auth" passes
done when "semgrep --severity=high" finds nothing
look at: src/auth, and the last failure
each cycle: plan, then act, then observe
when it fails: reflect, then plan again
after 8 tries: stop and warn "auth loop stuck"
finds nothing means "this scanner must report zero" โ exit 0 and empty output. Stacking a test with a scan is how you make one loop prove both "it works" and "it's clean."
How to run it
Install once, then two ways to drive it:
npx @loop-lang/loop init # writes the /loopflow skill + AGENTS.md into your repo
- In a Claude Code chat โ
/loopflow run fix-tax.loop. You watch every plan โ act โ observe โ reflect step and answer human gates inline. Withinit, Claude also reaches for a.loopon its own when work is repeatable and verifiable โ you often don't type the slash at all. - Headless โ
loop-run run fix-tax.loopin a terminal or CI. Print the shape first withloop-run show fix-tax.loop; a missing done-when or thrash guard is obvious in the ASCII.
A .loop is plain text, so the same file works in Cursor or Copilot โ point the agent at AGENTS.md and it reads the goal, the check, and the stopping rules the same way Claude Code does.
Common pitfalls
- No real check. A loop with no
done whenhas nothing to self-correct against โ it just runs to the cap. Write the check before the behavior; if you can't, you don't understand the goal yet. - A test runner that exits 0 on failure. The check runs with your privileges in your shell. If it lies about success, the loop lies to you. Run the command by hand once before trusting it.
- Dropping the failure. Forget
and the last failureand every cycle re-plans blind. That's a retry, not self-correction. - A back-edge with no ceiling.
reflect, then plan againwithoutafter N triescan burn tokens on an unfixable goal. The engine caps every loop at 25 cycles regardless, but set your own floor with a message that tells future-you what got stuck. - Unbounded scope.
look at: the whole repomakes the agent wander and rewrite working code. Name the three files that matter.
Wrap-up
Making Claude Code self-correct isn't a trick โ it's structure. A goal, a check the OS grades, a reflect back-edge, and a try ceiling turn one-shot prompting into a loop that retries until the tests pass and stops when they do. Write it once, re-run it forever, review it in a PR.
Try it now in the playground, or learn the whole language line by line in the tutorial.