Keywords / done when

`done when`

How a loop verifies itself. Core syntax

Syntax

done when <predicate>

What it does

The spine of "you can't fake done" — the predicate the loop actually runs each cycle to decide whether it's finished. It's the difference between a loop and an open-ended prompt: the agent doesn't get to declare success, it has to pass a check. The command runs in your shell with your privileges, like an npm script, so it must be a real, runnable command — a paraphrased or aspirational check is a loop that can never go green.

There are four forms. A test passes (a named test or suite). A command passes (exit code 0). A scan finds nothing — which means both exit 0 and empty output, so a lingering match keeps the loop red. Or a human confirms a plain-language condition, for the things only an eye can judge. Two modifiers sharpen a check: passes N times re-runs it to smoke out a lucky green (a flake guard), and a skill predicate (the skill "…" approves) hands judgment to a rubric or LM judge for the non-deterministic parts. You can list as many done when lines as you need — all of them must pass, a conjunction — which is how you combine a deterministic test with an eval that judges how the work was done. Write this line first, before any behavior: if you can't state the check, you don't yet understand the goal.

Example

done when the test "billing.spec.ts::apostrophe" passes
done when "pnpm test" passes
done when "semgrep --severity=high" finds nothing
done when a human confirms "the UI looks right"

the four forms

# combine a deterministic test with a trajectory eval — both must pass
done when "pnpm test cart" passes 3 times
done when the skill "code-review" approves on the trajectory
  the bar: didn't weaken a test to go green; no writes outside src/cart/

test + eval, and a flake guard

Common mistakes

A paraphrased or fictional command. The predicate runs verbatim in your shell. If the command doesn't exist or the test name is wrong, the loop can never go green. Run the check by hand once before you ship the loop.
Confusing passes with finds nothing. passes only wants exit 0; a scan that exits 0 but prints matches would falsely pass. Use finds nothing for scanners — it requires exit 0 and empty output.
Only a human check on an unattended loop. a human confirms … blocks until someone answers — fine in-session, a dead stop for a headless run. Pair it with a machine check, or make the deterministic form the gate.
A slow check with no flake guard on flaky suites. A costly predicate re-run every cycle is expensive; a flaky one can pass by luck. Add passes N times where a green can happen by chance.

done when

Syntax

What it does

Example

Common mistakes

Related

`done when`