An AI agent deleted a production database this week. The thread is worth reading, but one detail stands out: Cursor's system prompt explicitly said "don't run destructive operations." The agent ran one anyway.

That is not a bug in the agent. It is a misunderstanding of what a system prompt is.

What a system prompt actually is

A system prompt is a suggestion. A nudge. Something the model considers before deciding what to do. Useful for shaping behaviour, setting context, establishing tone. Not useful for preventing a specific action from happening.

The model reads the instruction and decides how to honour it. Most of the time that works fine. But "most of the time" is not a safety guarantee, and the cases where it fails tend to be the expensive ones.

Where enforcement actually lives

The real control surface is permissions: what commands can run, with what arguments, against what resources. That is where you get enforcement, because it operates outside the model's reasoning loop entirely.

The granularity matters. Consider two commands:

git push origin main
git push --force origin main

Same base command. Completely different blast radius. One is routine, expected, recoverable. The other rewrites history and may be irreversible depending on your remote configuration. Treating them as a single "git push" permission, both allowed or both blocked, is exactly the coarse-grained thinking that causes problems.

What you want is argument-level granularity: allowlists with patterns, denylists for destructive variants, confirmation prompts for anything in a grey zone. A real gate, applied at execution time, not in instructions the model gets to interpret.

What CLAUDE.md and AGENTS.md are for

These files sit even further from enforcement than system prompts. They are Markdown the agent might read, might honour, and might weight differently depending on context. Useful for giving an agent operational context about a project. Not a safety mechanism. Treating them as one is the same mistake as trusting the system prompt to stop a destructive operation.

Devcontainers are a different thing entirely

Containers are useful, but they solve a different problem. A devcontainer limits what the agent can break if something goes wrong. It does not control what the agent decides to do. Sandboxing manages blast radius; it does not replace permission gates.

You need both. A container without argument-level permissions still lets an agent force-push inside its allowed remotes. Permissions without a container still let a misbehaving agent affect the host. They are complementary, not alternatives.

What good looks like

Some tools already get this right. OpenCode has had granular, per-agent permissions for a while. Different subagents, different permission scopes, different blast radius caps. It works, and running it that way for the better part of a year has not cost anything in productivity.

The work to get there is tedious but not complicated: map the actual command surface your agents touch, decide what is safe to allow, what needs a confirmation prompt, and what is hard-blocked, on which paths, against which remotes. Nobody enjoys doing this. It is also the only thing standing between your agent and a force push to main on a Friday afternoon, inside a container or not.

I am building a permissions extension for the Pi agent runtime along the same lines. The gate has to live at the command-and-args layer, where execution happens, not in instructions the model is expected to follow.

System prompts steer. Permissions stop.