Blog / Make Claude Code self-correct

How to make Claude Code self-correct

Claude Code does one pass and stops; a .loop gives it a verifiable finish line and a reflect back-edge so it retries until the tests actually pass.

Claude Code does one pass and stops. To make it self-correct — retry until the tests actually pass — you need a verifiable finish line and a back-edge that feeds each failure into the next attempt. That is exactly what a .loop file is.

The problem: agents do one pass and stop

Ask Claude Code to "fix the failing checkout test" and it edits some code, runs the suite once (maybe), and replies "Done — fixed the rounding." Sometimes it's right. Often the test is still red, or a different test broke, or it never ran the suite at all. "Done" is whatever the model said, not a command that passed.

The fix isn't a better prompt. A prompt fires once and trusts the model's word. What you want is a control loop: the agent acts, a real check grades the result, and on red the agent tries again — with the failure in hand — until the check is green or a ceiling stops it. That control loop is the artifact you edit, not the prose.

What "self-correcting" actually means

A self-correcting run is four repeating steps wrapped around a verdict the model can't author:

plan — read the scoped context, decide the smallest change toward the goal.
act — make the change.
observe — run the done when check and read pass/fail.
reflect — on red, diagnose why it failed and feed that into the next plan.

The load-bearing part is observe. At that step the runtime — plain deterministic code, not the LLM — spawns your check as a real OS process and reads the exit code from the operating system. Exit 0 = pass; anything else = fail. The model does the work; the OS grades it. That's why "done" can't be faked: the verdict never comes out of the model's mouth.

The other load-bearing part is reflect, the back-edge. Without it, a retry is just the same blind attempt again. With it, cycle two reads what broke in cycle one. That is the difference between a loop that thrashes and one that converges.

A concrete walkthrough

Say the checkout tax test is red. Here is the whole self-correcting loop — the finish line at the top, the safety net at the bottom:

loop "fix the checkout tax test":
  goal: the checkout tax test passes with no regressions
  done when the test "checkout.spec.ts::tax" passes

  look at: src/checkout, and the last failure
  allow edits automatically, but ask me before pushes

  each cycle: plan, then act, then observe
  when it fails: reflect on which layer broke, then plan again
  after 6 tries: stop and warn "tax fix thrashing"

Every cycle runs the named test. Red → reflect writes a one-line diagnosis ("rounding happens before the discount, not after") → that becomes context for the next plan. The loop stops only when the process your machine ran returns 0. It works on a branch and, with no git: block, never pushes to main.

Two lines earn their keep. look at: ends with and the last failure — reflect's diagnosis only reaches the next plan if you pass it forward. And after 6 tries is the floor: a back-edge with no ceiling is a token bonfire. Never write one without the other.

The check doesn't have to be a test. Any command that exits non-zero on failure works, and you can list several — all must pass:

loop "harden the auth middleware":
  goal: rate limiting works and no high-severity findings remain
  done when "pnpm test src/auth" passes
  done when "semgrep --severity=high" finds nothing

  look at: src/auth, and the last failure
  each cycle: plan, then act, then observe
  when it fails: reflect, then plan again
  after 8 tries: stop and warn "auth loop stuck"

finds nothing means "this scanner must report zero" — exit 0 and empty output. Stacking a test with a scan is how you make one loop prove both "it works" and "it's clean."

How to run it

Install once, then two ways to drive it:

npx @loop-lang/loop init      # writes the /loopflow skill + AGENTS.md into your repo

In a Claude Code chat — /loopflow run fix-tax.loop. You watch every plan → act → observe → reflect step and answer human gates inline. With init, Claude also reaches for a .loop on its own when work is repeatable and verifiable — you often don't type the slash at all.
Headless — loop-run run fix-tax.loop in a terminal or CI. Print the shape first with loop-run show fix-tax.loop; a missing done-when or thrash guard is obvious in the ASCII.

A .loop is plain text, so the same file works in Cursor or Copilot — point the agent at AGENTS.md and it reads the goal, the check, and the stopping rules the same way Claude Code does.

Common pitfalls

No real check. A loop with no done when has nothing to self-correct against — it just runs to the cap. Write the check before the behavior; if you can't, you don't understand the goal yet.
A test runner that exits 0 on failure. The check runs with your privileges in your shell. If it lies about success, the loop lies to you. Run the command by hand once before trusting it.
Dropping the failure. Forget and the last failure and every cycle re-plans blind. That's a retry, not self-correction.
A back-edge with no ceiling. reflect, then plan again without after N tries can burn tokens on an unfixable goal. The engine caps every loop at 25 cycles regardless, but set your own floor with a message that tells future-you what got stuck.
Unbounded scope. look at: the whole repo makes the agent wander and rewrite working code. Name the three files that matter.

Wrap-up

Making Claude Code self-correct isn't a trick — it's structure. A goal, a check the OS grades, a reflect back-edge, and a try ceiling turn one-shot prompting into a loop that retries until the tests pass and stops when they do. Write it once, re-run it forever, review it in a PR.

Try it now in the playground, or learn the whole language line by line in the tutorial.

LoopFlow · the tutorial · more posts · GitHub. Every example here is a real, runnable .loop.