Loop engineering: the term is two weeks old, the practice is over a year old

For the last two weeks my feed has been "loop engineering" this, "loop engineering" that. Addy Osmani named it on June 7. Within ten days there were follow-ups from Cobus Greyling, Lushbinary, MindStudio, Louis-François Bouchard, Kilo, Firecrawl, several YouTube videos, an Instagram reel, and a Reddit thread asking whether it is just the next buzzword. Two industry figures got cited everywhere. Boris Cherny, who leads Claude Code at Anthropic: "I don't prompt Claude anymore. I have loops running. My job is to write loops." Peter Steinberger, creator of OpenClaw: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."

I read enough of these to figure out what was being claimed, and then I had a slightly disorienting realisation. By every definition in those articles, I have been loop engineering for over a year. At one point I actually asked Claude whether we should throw some loop engineering into pi-ensemble. Claude pointed out, with the patience of someone explaining something obvious, that pi-ensemble already is loop engineering.

What the term actually means

The framing that has settled out across the articles is a three-floor stack. Prompt engineering is the ground floor: write a good prompt for a single turn. Harness engineering is the middle floor: design the environment a single agent runs inside (its tools, its context, its rubric). Loop engineering is the top floor: design the system that prompts the harness for you. It runs on a schedule. It spawns sub-agents. It verifies its own output. It decides whether to keep going. The model becomes a subroutine inside your loop, not a chat partner on the other side of a prompt box.

The four-step cycle inside the loop is the same in every article: act, observe, reason, repeat. The articles also converge on the same structural ingredient as the thing that actually makes loops work, which is splitting the maker from the checker. The model that wrote the code is too charitable about its own output. A second agent with a different system prompt, and ideally a different model, catches what the first one talked itself into. Sub-agents in .claude/agents/ (Claude Code) and .codex/agents/ (OpenAI Codex) are the productised primitive for this. Addy Osmani makes it the centrepiece of his post. Boris Cherny describes it as how he actually works. The articles call this the heart of the practice.

The pattern is real. The term is also real, and it does shift the conversation usefully: the leverage point moves from writing prompts to designing the system that writes them.

What I was already running

This is the part that was disorienting.

For the last year my main coding environment has been a forked opencode with a custom multi-agent configuration. The shape: one parent process acting as a project manager, dispatching to specialist children. Developer. Adversarial reviewer. Ops. Explore. Code-review children, one per lens (security, error handling, type safety, performance, architecture, simplicity). The PM holds the workflow state. The children do the work and report back. Nothing in the loop talks to me on a per-turn basis. I give it an issue or a directive; it runs through plan, work, gate, review, until it has produced something to merge, or it has hit something it cannot handle and has to escalate.

This year I rebuilt the whole thing as a clean Pi extension called pi-ensemble. Same architecture, less fork maintenance. The five slash commands cover the cycle Osmani describes almost line for line:

/start initialises the session: searches memory, indexes the codebase, gathers git/PR/CI state. Discovery.
/research fans out explore specialists in parallel. Context.
/plan drafts and classifies a GitHub issue. Intent.
/work runs the full pipeline: branch, developer, mandatory adversarial gate (up to 3 fix rounds), commit, PR, six-pass code review, CI watch, merge. Act, observe, verify, repeat.
/review runs the six-lens review on demand against any PR or path.

The maker/checker split that Osmani says is the most useful structural thing in a loop is, in pi-ensemble, two separate gates. The adversarial-developer child gets the diff before any commit and tries to break it. Three rounds of fix-and-retry. If it survives that, the six lens reviewers run in parallel, each pinned to its lens, and the findings get deduplicated and precedence-merged into a verdict. Merge does not happen on a critical verdict without override.

I built none of this because anyone called it loop engineering. I built it because turn-by-turn babysitting of a coding agent on hard tasks does not work, and I needed a system that could grind through real PRs without me holding its hand. The pattern emerged from the problem. I am sitting at three screens, up to 6-7 separate sessions and burning hundreds of millions of tokens a day. It would not be possible if I had to be constantly involved with every decision in every session.

What was actually new about the term

The pattern is older than the term. Geoffrey Huntley's "Ralph" technique (early 2026, before there was a name for any of this) is a one-line shell loop that feeds the same prompt to a fresh agent until a status file says done. The articles correctly cite Ralph as the prior art. My setup is a more structured version of the same idea, with named roles and explicit gates instead of one prompt and a status file. Many other practitioners landed on similar shapes independently. The Anthropic Effective harnesses for long-running agents write-up describes the same primitives. OpenAI's Symphony is a fleet-management layer over the same cycle.

What the term does is consolidate a lot of small individual realisations into one named thing that the field can argue about. That is not nothing. Before the name, you had to spend a paragraph explaining what you were doing. After the name, you can point at the stack and say "this is the loop part" and most people understand. Naming things compresses the discourse, and a compressed discourse moves faster.

The other thing the term does is force the maker/checker question to the front. A lot of the early agentic coding hype was "one big agent that does everything." The loop engineering framing makes it obvious that the interesting design choices are about the structure of the loop, not the capability of the single agent. That is the right place for the leverage to be.

What the articles get wrong

Two things, mostly minor.

First, the articles tend to treat the maker/checker split as something you bolt onto a single-agent setup. In practice, the more useful framing is that the loop is multi-agent by construction. The PM is not an enhanced single agent. It is a different kind of agent, with a different job, that happens to dispatch other agents. Treating the orchestrator as first-class changes the questions you ask about the system.

Second, the cost numbers in the new posts are wild. A six-pass code review at frontier-model rates per PR adds up fast. The H100 in production economics make this more defensible, but the articles tend to gloss the operating envelope. Loop engineering only pays for itself when the loop produces something worth its token budget, which is much harder than getting the loop to run.

What I am taking from this

Mostly that the term is useful enough that I will start using it. "Pi-ensemble is my loop engineering setup" is shorter than what I used to have to say.

The deeper thing is the same observation that comes up every time the field names a pattern that practitioners were already running. The naming compresses the discourse, but it also resets the apparent frontier. Articles dated June 7 onward get framed as "the new wave." Setups that were doing the same thing in March or April look like prior art. There is a slight unfairness in how the credit lands, and a slightly larger unfairness in how the buyer-facing narrative settles ("this just emerged"). Neither is the term's fault. The pattern is older than the name, and the people who needed the pattern figured it out before there was a name for it.

If you are reading the loop engineering articles and thinking "this looks like what I have been doing," you are probably right. The discourse caught up. That is good. Use the name. Cite the framing. And do not be surprised that the actual work, the rubrics inside the loop, the verification step, the taste calibration, did not get easier just because there is a term for the box you put it all in.

Mine are not perfect, by the way. Still tuning. Current state at github.com/randomm/pi-ensemble.