Adding an agent role is more expensive than it looks

I almost added a seventh role to pi-ensemble this week. The reasoning was plausible enough. When the adversarial gate rejects three rounds in a row with overlapping themes, the loop is signalling that the approach is wrong, not the implementation. A fresh "architect" agent could step back and propose a different frame. The PM (Project Manager) would dispatch it on cap-hit. Clean idea. Easy to specify.

I did the research before writing the prompts. The research said no.

What the data shows

The authoritative source is the MAST paper (Cemri et al., NeurIPS 2025): 1,642 execution traces across 7 popular multi-agent frameworks, 14 distinct failure modes, three categories. The headline finding is that failure rates on state-of-the-art multi-agent systems sit between 41% and 86.7%, and that "performance gains often remain minimal compared to single-agent frameworks or simple baselines like best-of-N sampling."

Two of the 14 failure modes are directly relevant to adding a role:

Disobey Role Specification: 11.8% of all failures. An agent silently behaves like a different agent. The more roles in the system, the more chances for drift.
Step Repetition: 13.2% of all failures. The orchestrator loses track of what has already been done. The orchestration prompt grows with each role; the orchestrator's grip on it does not.

Add the broader Inter-Agent Misalignment category (31-32% of failures: conversation reset, task derailment, information withholding, ignoring other agents' input, reasoning-action mismatch) and you have an empirical picture that is not subtle. Coordination is where multi-agent systems actually fail. Not capability. Not model choice. Coordination.

The cost of one more role

The intuition I want to displace is that an additional role costs one role's worth of overhead. It does not. Each new role:

Expands the orchestrator's decision space on every turn (more dispatch conditions to evaluate, more routing combinations to get right)
Dilutes instruction density in the orchestrator's prompt (the "lost in the middle" phenomenon kicks in earlier when the prompt is busier)
Adds a compression event for every handoff (the downstream agent sees the output, not the reasoning behind it)
Creates a new failure surface (every role drift is a potential bug)

The cost is paid every turn the loop runs. The benefit, in the architect-agent case, would be paid only when the adversarial gate hits a cap with thematic overlap. Low-frequency upside against constant-cost downside. The arithmetic does not work.

What I am doing instead

Doctrine, not a new role. The fix is a few paragraphs in the PM's prompt:

Watch for the pattern: three adversarial rejections with overlapping themes (not orthogonal local bugs)
When detected, dispatch the existing @explore specialist with a step-back-framed prompt: "Don't review this diff. Consider whether the whole approach is right. Given the original issue and the recurring finding pattern, is there a fundamentally different way to solve this?"
Take the result, update the spec, surface to user for approval
Re-enter from /plan with the revised spec

Zero new roles. Existing roles, different prompts. The fresh-context property I wanted from "architect" is already present in @explore (no awareness of the current diff, no role-bias toward defending it or finding bugs in it). The reframe is the prompt, not the role.

This matches what Augment Code's production Coordinator-Specialist-Verifier pattern does. It also matches the recommendation that comes out of MAST: the structural redesign is "removing agents from the coordination role entirely. Agents execute. A governed state machine coordinates." In pi-ensemble, the PM doctrine is the state machine. Doctrine changes are cheap. Roles are expensive.

This is not the first time the doctrine-not-role move has paid off in pi-ensemble. The developer agent already handles what I call the knee method: an agent learning something mid-work that suggests the spec is wrong, and ploughing on into scope it should not be in. Drew Breunig has written the clearest framing of why this matters. His Spec-Driven Development Triangle treats implementation as a feedback mechanism rather than a one-way pipeline: "the act of writing code improves the spec, and it improves the tests." The doctrine in pi-ensemble's developer prompt is the operational counterpart of that idea. Encounter something unexpected that might change the approach, stop and report to the PM, let the PM decide whether the spec needs updating. The agent is the same role. The behaviour is different because the prompt is.

The general rule

If you are tempted to add a role to a multi-agent system, ask whether the same behaviour can be achieved by a different prompt to an existing role. In my experience, the answer is yes most of the time. The exceptions are rare enough that they should be carefully argued for rather than reached for as the default move.

Less is more, in multi-agent setups as elsewhere. The empirical data agrees, which is the more interesting thing than my taste agreeing.