One-shot prompts vs self-correcting loops
Two ways to point an AI coding agent at a task
A one-shot prompt is the default: you describe the change, the AI coding agent edits some files, replies "Done", and the turn ends. A self-correcting loop is the other shape: you describe the change and a machine-checkable definition of finished, and the agent plans, acts, checks its own work, and keeps going until the check is green โ or stops and warns you it's stuck.
The difference isn't the model or the prompt quality. It's the control structure wrapped around the model. One fires once and trusts the reply. The other verifies, reflects on failure, and only stops on proof. This post walks the trade-off and shows both โ because a loop isn't always the right call.
Where one-shot prompts break down
A one-shot prompt has one structural flaw, and everything else follows from it: there is no verification step the agent must pass. The agent decides when it's done, and it announces done whether or not the work is done.
- "Done" is a claim, not a result. The agent says the test passes. Did you run it? A one-shot prompt stops when the model stops typing, not when the suite goes green.
- Failure is your job to catch. You notice the regression, re-read the diff, re-type the prompt, and hope the second try lands. The retry loop exists โ it's just running in your head and your keyboard.
- Nothing bounds the actions. On the way to "done" the agent might edit files you didn't mean to touch, weaken a test to make it pass, or run
git push. A prompt can ask it to be careful; it can't enforce it. - It doesn't compound. The next person re-types a slightly different paragraph and gets slightly different behavior. There's no artifact to review, diff, or re-run.
None of this means the model is bad. It means an unverified single shot is the wrong container for work where "done" actually has to be true.
What a self-correcting loop adds
A .loop file adds exactly the four things a one-shot prompt is missing โ a checkable finish line, a failure back-edge, gates on risk, and a floor so it can't spin forever.
- A verifiable finish line. done when names a real command โ a test, a scanner, a script. At the observe step the runtime (deterministic code, not the model) spawns that command as an OS process and reads the exit code. Exit 0 = pass. The model does the work; the OS grades it. You can't fake an exit code.
- Self-correction on failure. When the check fails, reflect reads the failure output, writes a short diagnosis, and feeds it into the next plan. That back-edge is what makes it self-correcting instead of an agent that retries blindly.
- Human gates on the risky steps.
ask me before โฆanda human approves before โฆput a person in front of migrations, deploys, anything irreversible โ and by default a loop works on a branch and never pushes tomainormaster. - A thrash guard.
after N tries: stop and warnis the floor. An unfixable goal stops burning tokens and tells you what got stuck, instead of looping forever.
Side by side: the same task, both ways
The one-shot prompt:
"Fix the failing checkout tax test in src/checkout and make sure nothing else breaks." The agent edits, replies "Done โ fixed the rounding." Did checkout.spec.ts::tax pass? The whole suite? You re-run it yourself. Failed? Re-prompt. And it may have pushed on the way.
The same task as a self-correcting loop:
loop "fix the checkout tax test":
goal: the checkout tax test passes with no regressions
done when the test "checkout.spec.ts::tax" passes
look at: the checkout code, and the last failure
allow edits automatically, but ask me before pushes
each cycle: plan, then act, then observe
when it fails: reflect, then plan again
after 6 tries: stop and warn "tax fix thrashing"
It runs the test every cycle. Fails, reflects on why, fixes again. Stops only when the test is green โ or warns after six. Works on a branch, never touches main. The same file runs tomorrow, and a teammate can read exactly what "done" meant.
You can stack the finish line for work where one green isn't enough. Multiple done when lines are a conjunction โ all must pass:
loop "harden the auth middleware":
goal: sessions expire correctly and no high-severity findings remain
done when "pnpm test src/auth" passes
done when "semgrep --severity=high" finds nothing
look at: the auth middleware, and the last failure
allow edits automatically, but ask me before touching migrations
each cycle: plan, then act, then observe
when it fails: reflect on which check broke, then plan again
when blocked: ask a human
after 8 tries: stop and warn "auth hardening stuck"
A one-shot prompt can describe both conditions. Only the loop enforces them, cycle after cycle, and self-corrects toward both.
When a one-shot prompt is still the right call
Loops cost more โ every cycle re-plans and re-runs the check, and that's tokens and wall-clock. Don't wrap ceremony around work that doesn't need it. A one-shot prompt wins when any of these hold:
- The task is one-off. Rename a variable, tweak copy, answer a question about the codebase. Nothing repeats, so there's nothing to reuse.
- "Done" isn't checkable. If you can't write the finish line as a test, a command, or an explicit human review, a loop has nothing to verify against โ it degrades to a one-shot with extra steps.
- The iterations aren't worth it. A trivial edit you can eyeball in a diff doesn't earn a plan-act-observe cycle.
The rule of thumb is the four-condition gate: does the task repeat, is "done" checkable, are the iterations affordable, and can the loop verify itself? All four hold โ reach for a loop. Any one fails โ just prompt, and move on.
The takeaway
A one-shot prompt asks the AI coding agent for an answer. A self-correcting loop asks for a result and won't stop until a real check agrees. Use the prompt for the trivial and the one-off. Reach for the loop the moment "done" has to be provably true, the agent should retry on its own, and a human needs to gate the dangerous step. Same model โ a container that verifies instead of hoping.
Try both shapes side by side in the playground, or learn the whole language from the first line in the tutorial.