Same price per token. More tokens per request. On April 16, 2026, Anthropic released Claude Opus 4.7. The rate card stayed at $5 per million input tokens and $25 per million output tokens, identical to Opus 4.6. No price increase announcement. No billing change email. But Opus 4.7 ships with a new tokenizer. The same input text now produces up to 35% more tokens than it did on 4.6. More tokens at the same price per token means a higher bill per request. This is not a bug. The new tokenizer improves how the model processes text, and the quality gains in Opus 4.7 are real. But if you are budgeting based on the rate card alone, your forecasts are wrong. The numbers OpenRouter published an analysis based on over one million requests from users who switched from Opus 4.6 to 4.7. The findings: Production-scale prompts (10K+ tokens): 32 to 34% more tokens for the same input text Short prompts (under 2K tokens): 42 to 45% more tokens, but Opus 4.7 generates 62% fewer output tokens on these queries, so the net cost is actually lower Real-world cost increase: 12 to 27% higher per-request cost for prompts above 2K tokens The pattern is clear. Short prompts got cheaper because Opus 4.7 is more concise on them. Long prompts got more expensive because the tokenizer inflation dominates. Simon Willison built a token counter that compares models side by side. The same code snippet that tokenizes to 1,000 tokens on Opus 4.6 becomes 1,320 to 1,350 tokens on 4.7. The same English paragraph jumps from 500 to 580 tokens. Structured data like JSON and XML sees the highest inflation. Prompt caching absorbs some of it Anthropic's prompt caching discounts cached input tokens by 90%. Since the new tokenizer inflates all tokens equally, cached and uncached, the 90% discount means extra cached tokens cost almost nothing. OpenRouter's data shows this clearly. For prompts over 128K tokens, 93% of the extra tokens land in the cache. The effective cost increase on these long-context requests is small, around 2 to 5%. The problem is the mid-range. Prompts between 2K and 30K tokens, which is where most agentic coding turns fall, see the full 12 to 27% increase. These prompts are long enough for the tokenizer inflation to matter but often too varied for caching to help. Why this hits agentic workloads hardest Agentic coding sessions (Claude Code, Cursor, Aider, Windsurf, Codex) are exactly the workloads in the worst spot. A typical session looks like this: | Turn | Input tokens (Opus 4.6) | Input tokens (Opus 4.7) | Increase | |------|------------------------|------------------------|----------| | 10 | ~12,000 | ~15,800 | +32% | | 20 | ~22,000 | ~29,000 | +32% | | 30 | ~32,000 | ~42,200 | +32% | | 40 | ~40,000 | ~52,800 | +32% | At $5 per million input tokens, a 40-turn session that cost $4.53 on Opus 4.6 now costs roughly $5.98 on 4.7. That is a 32% increase on a session that never changed. Scale that across a team running hundreds of sessions per week and the difference is thousands of dollars per month. The retry tax compounds it further. When an agent hits a test failure at turn 35 and retries three times, each retry carries 35,000+ tokens of accumulated context. Those three retries cost $0.53 on 4.6 and $0.70 on 4.7. The tokenizer tax applies to wasted work too. The fix is not staying on 4.6 Opus 4.6 is still available, and some teams have pinned to it to avoid the cost increase. This is a short-term workaround, not a strategy. Anthropic will eventually deprecate 4.6, and the quality improvements in 4.7 are genuine. The new tokenizer exists because it helps the model perform better. The real question is: does every request in your workload need Opus at all? Where model routing changes the math The tokenizer inflation only applies to requests that hit Opus 4.7. Sonnet 4.6 and Haiku 4.5 use the same tokenizer as before. Their token counts have not changed. In a typical agentic workload, 60 to 70% of turns are low complexity: reading files, checking status, formatting output, parsing errors. These are tasks that Haiku handles correctly at $1 per million input tokens with the old tokenizer. Another 20 to 30% are medium complexity, and Sonnet handles them at $3 per million input tokens, also with the old tokenizer. Only 5 to 15% of turns actually need Opus-level reasoning. Those are the only turns that pay the tokenizer tax. Here is what that looks like on the same 40-turn session: | Segment | Turns | Model (routed) | Cost (all Opus 4.7) | Cost (routed) | |---------|-------|----------------|--------------------:|---------------:| | File reads, status checks | 25 | Haiku 4.5 ($1/M) | $3.95 | $0.45 | | Code generation, tests | 10 | Sonnet 4.6 ($3/M) | $1.45 | $0.84 | | Architecture, debugging | 5 | Opus 4.7 ($5/M) | $1.32 | $1.32 | | Total | 40 | | $6.72 | $2.61 | Without routing, the tokenizer change increased the session cost from $4.53 (all Opus 4.6) to $6.72 (all Opus 4.7), a 48% jump. With routing, the session costs $2.61, which is 42% less than the old all-Opus-4.6 baseline and 61% less than all-Opus-4.7. Routing does not avoid the tokenizer tax. It limits the tax to the requests that actually benefit from the model that charges it. Three things you can do today 1. Audit which requests actually need Opus. Pull your API logs and check how many of your requests are classification, formatting, or simple Q&A. Most teams find that 60%+ of their calls do not need a frontier model. Those calls should not be paying frontier tokenizer rates. 2. Use prompt caching on everything you can. The 90% cache discount absorbs most of the tokenizer inflation for repeated content. System prompts, tool schemas, and few-shot examples should all be cached. This is good practice regardless of routing. 3. Route per request. A trained classifier that evaluates each prompt and sends it to the cheapest capable model turns the tokenizer tax from a blanket 12 to 27% increase into a 2 to 5% increase on the subset of requests that need Opus. The classifier overhead is under 10 ms per request. The bigger picture Tokenizer changes are not one-time events. As model architectures evolve, tokenizers will change again. GPT-5.5 already had its own tokenizer adjustment earlier this year. Building a workflow that depends on stable token counts for a specific model version is fragile. Model routing insulates you from this. When the tokenizer changes, when the pricing changes, when a new model drops, the router re-evaluates each request against the current landscape. You do not need to rewrite your cost model every time a provider ships an update. Nadir evaluates each request in under 10 ms and routes to the cheapest model that can handle it. The x-nadir-cost-saved response header shows the savings on every request. Change your base URL, set model="auto", and the tokenizer tax stops being your problem. Sources: OpenRouter, "Opus 4.7's New Tokenizer: What It Actually Costs" (April 2026). Finout, "Claude Opus 4.7 Pricing: The Real Cost Story" (2026). Simon Willison, "Claude Token Counter, now with model comparisons" (April 2026). CloudZero, "Claude Opus 4.7 Pricing: What It Actually Costs" (2026). Anthropic Claude model pricing as of May 2026.