5 chapters · tap to jump

1Why one agent isn't enough 2The five stages my agents run 3Two gates, two rounds 4The catch it's built to make 5What it buys, and what it doesn't

operating modelphase Frameaudience T1read 6 min

The operating model: a pipeline my agents run

Outcome

After this you can DESCRIBE the operating model.

gate-ledgerbuilt byCClaudereviewed byCoCodexrounds 21 catchprovenance demo

round detail

Round 1Codex — 8 findingschanges requested

Round 2Codex — 0 findingsapproved

Why one agent isn't enough

I stopped trusting any single agent's output the day a confident, wrong change almost shipped past me.

It looked finished. The reasoning was fluent, the diff was small, the tests were green. The only thing missing was a reason to believe it — and "the model sounded sure" is not a reason. So I stopped reviewing agent output as a person skimming a PR, and started running my work as a pipeline that my agents execute and that I gate at two points.

The reframe is small but total: I am not the author with an assistant. I am the editor of a process. The agents do the building and the reviewing; I own the two decisions that a process cannot make for itself — what we start, and what we ship.

The five stages my agents run

Here is the whole model on one screen. Every change moves through the same five stages, and the same roles, every time.

Plan — a task becomes a written plan: scope, blast radius, the exit gate it must clear. The deliverable is a plan I can reject; no plan, no build.
Build — the Lead implements that plan, and only that plan; a diff that drifts from it gets sent back.
Review — a different checkpoint reads the diff as an adversary and returns findings with file and line. "Looks fine" is not a review.
Fix — the Lead answers each finding: change it, or push back with a reason. Waving one through silently is a rejection.
Gate — I make the one call the pipeline can't: ship, or don't.

Two gates, two rounds

Two numbers run this pipeline, and they are not the same two. I hold 2 gates — the start and the ship. Between them, every change gets 2 mandatory review rounds, even when round one comes back clean, from a checkpoint that did not write it. The rounds rule is the load-bearing one: your reviewer is a different checkpoint. This very page went through that loop — a different model reviewed it twice before you read it, sent back changes on the first pass, and signed off on the second.

The catch it's built to make

Here is the shape of the catch this pipeline is built to make — an illustrative example, not a war story. A gate returns ALLOW when no policy is configured fail-open-default, under a confident comment explaining why that's safe for local runs. It isn't. A missing policy is exactly when a gate must refuse, not wave traffic through — and a same-model self-review tends to nod along with the confident comment instead of catching it.

Step through it yourself before the reviewer does. (This snippet is a labeled demo; the first catch from a real run lands further along the arc.)

The Diff of Judgmentillustrative demo

def check(policy_path):
    policy = load_policy(policy_path)
    if policy is None:
        return Decision.ALLOW
    return policy.evaluate()

Codex flagged line 4. An unconfigured gate must fail closed, not open. As written it silently passes everything when the policy file is missing.

Claude claims

“No policy configured means skip the gate, so local runs are never blocked.”

Fluent and specific — and the kind of confident-wrong a same-model self-review misses.

What it buys, and what it doesn't

I want to be precise about what this buys and what it doesn't. It buys me a second, structurally independent reading of every change — the failure mode where an agent is fluently, specifically wrong is the one a same-model self-review is worst at catching. It does not buy correctness. A different checkpoint catches different mistakes, not all mistakes; the pipeline still routes its hardest calls to me, on purpose. This is process, not magic.

What it changed is where my attention goes. I no longer audit every line an agent writes. I audit the two gates — the task we accepted and the artifact we're about to ship — and I trust the stages between them only because a different checkpoint signed off, in writing, twice.

A reviewer that shares the author's blind spots is not a second reviewer.

The two-gate operating model

Copy it, rename the roles to your stack, keep the two rules.

select to copyTHE TWO-GATE OPERATING MODEL

Roles
- Lead    — builds against an approved plan, and only the plan.
- Support — a *different* checkpoint; reviews as an adversary.

Pipeline (every change)
1. Plan    task -> written plan (scope, blast radius, exit gate)
2. Build   Lead implements to the plan
3. Review  Support (a different checkpoint) reviews the diff
4. Fix     Lead answers: change, or push back with a reason
5. Gate    you decide: ship / don't

Rules
- Two mandatory review rounds. A clean round 1 does not skip round 2.
- The author is never the only reviewer.
- Gates fail closed: nothing configured -> refuse, don't allow.

The rest of the path is coming soon. Built in the open, one node at a time.