You feeling the squeeze yet? | janni.turunen.in

The era of predictable AI costs is over. Flat-rate subscriptions are disappearing from enterprise tiers, usage-based billing is replacing them, and the per-token rates underneath are going up. If your team is doing anything agentic, this is not an abstract pricing discussion.

The numbers

OpenAI moved first. GPT-5.2 is up 40% across the board:

	Old	New
Input	$1.25/M tokens	$1.75/M tokens
Output	$10.00/M tokens	$14.00/M tokens

Anthropic's April enterprise changes are sharper. Before: $200/user/month, flat, with discounted tokens included. After: $20/seat base plus per-token billing at full API rates plus a monthly spending commitment. For some customers that is a 3x cost increase, not a 40% one.

There is also a quieter change. Claude Opus 4.7 shipped a new tokenizer that produces 1.0 to 1.35 times more tokens for the same input text. The published rate card did not change. The effective cost went up roughly 40% anyway.

Why agentic workflows feel this differently

A developer chatting with an LLM uses tokens in bursts. An agent running in a loop uses them continuously, across multiple subagents, with long context windows, on every turn. Claude Code weekly active users doubled between January and February 2026 alone. That kind of growth is exactly why providers are raising prices, and it is exactly why those raises hit agentic teams hardest.

Average Codex cost is now being quoted at $100 to $200 per developer per month. Multiply that across a team of 50 and you have a real budget line, not a rounding error.

What changes because of this

Budget predictability is gone. Monthly spending commitments create a new category of financial risk that your infrastructure team is probably not set up to manage yet. API billing is flexible but has no ceiling.

Single-model strategies are becoming expensive bets. The cost spread between frontier models and capable cheaper alternatives has widened enough to make routing worth the engineering effort. xAI Grok 4.1 at $0.20/M input and $0.50/M output exists. Haiku 4.5 exists. Not every task in your agent pipeline needs the most expensive model.

Prompt caching and batch processing stopped being optional optimizations around the same time these price increases landed. Agentic loops that minimize round-trips are no longer just good practice. They are financially material.

Open weight models are a real option

For programmer and devops agents, Minimax M2.7 is capable enough to carry the workload. For orchestration, GLM-5.1 and Qwen3.6 (the plus variant) are worth a serious look. For teams with full air-gap requirements, Qwen3.6 MoE self-hosted removes the API dependency entirely. No token bills, no data leaving your infrastructure, no rate limits.

The tradeoff is operational: you are running infrastructure, managing updates, owning the uptime. For organisations facing six-figure annual API commitments, that tradeoff is starting to look different than it did twelve months ago.

The structural point

These increases are not a correction back to some future lower price. The compute demand from agentic development is real and growing, providers are constrained, and they are pricing accordingly. The organisations that treat AI infrastructure costs as a first-class architectural concern now will be in a better position than the ones that treat it as a line item to revisit later.

So: are you feeling the squeeze yet? If not, you probably will be.