Three billing changes in 30 days. The flat-rate era is officially over. On June 1, 2026, GitHub flipped every Copilot plan to usage-based billing. Instead of counting "premium requests," each plan now includes a monthly allotment of AI Credits pegged to token consumption. Copilot Pro+ gets $39 in credits. Business gets $19 per seat. Enterprise gets $39 per seat. When the credits run out, you pay per token or you stop. Source: GitHub Blog, "GitHub Copilot is moving to usage-based billing," June 2026 Source: GitHub Changelog, "Updates to GitHub Copilot billing and plans," June 2026 On June 15, 2026, Anthropic separates Claude Code's Agent SDK and headless (claude -p) usage from subscription limits. A new monthly credit pool replaces what was effectively unlimited agent access: $20 for Pro, $100 for Max 5x, $200 for Max 20x. The credits meter at full API rates. No rollover. Source: TechTimes, "Anthropic Ends Subscription Subsidy for Agents June 15: Credit Pool Replaces Flat-Rate Access," June 2026 Source: InfoWorld, "Anthropic puts Claude agents on a meter across its subscriptions," 2026 And in April, OpenAI updated Codex pricing to align with API token usage instead of per-message billing, extending the change to all Enterprise plans by April 23. Source: OpenAI Developers, "Pricing - Codex" Three platforms. Three billing overhauls. The same direction: every token now has a visible price tag. The subsidy math never worked at scale. The reason all three moved simultaneously is straightforward. Agentic coding workflows consume orders of magnitude more tokens than chat, and the flat-rate subscriptions were never priced for that volume. A $20 Claude Pro subscription was giving heavy Agent SDK users access to $300 to $600 worth of API-equivalent compute. That is a 15x to 30x subsidy. At chat-era usage levels, the subsidy was invisible. At agentic usage levels, it broke the model. Source: ExplainX, "The Claude Token Economy: A Deep Dive into Dedicated Programmatic Credits and the Future of Agentic Labor," 2026 GitHub saw the same dynamic. Developers reported burning through a month of Copilot credits in hours under the new billing system. Heavy agentic users modeled cost increases from $29 per month to $750 per month, a 10x to 50x jump. Source: The Register, "Angry devs vow to flee GitHub Copilot as metered billing takes hold," June 2026 OpenAI's Codex averages $100 to $200 per developer per month, with significant variance depending on model selection and usage patterns. Source: Verdent Guides, "Codex Pricing in 2026: Credits, Token Rates, and Limits" The numbers tell the same story across all three platforms. Flat-rate pricing worked when AI coding meant autocomplete. Agentic workflows that spin up dozens of tool calls, read entire codebases, and iterate through multi-step plans consume 1,000x more tokens than a code completion. The subscription model could not absorb that volume at those prices. Source: Stanford/Microsoft Research, "How Do AI Agents Spend Your Money?", arXiv:2604.22750, April 2026 $20 in Claude credits runs out faster than you think. At Sonnet 4.6 pricing ($3 per million input tokens, $15 per million output tokens), $20 covers roughly 6.6 million input tokens or 1.3 million output tokens. That sounds like a lot until you measure what a coding agent session actually consumes. A single agentic session with a large context window burns 100,000 to 300,000 tokens. A prompt with several retries can hit 300,000 tokens in one exchange. At that rate, $20 covers somewhere between 7 and 60 sessions per month, depending on complexity and context size. For teams running CI pipelines, automated PR reviews, or scheduled agent tasks through the Agent SDK, the credit runs out in days, not weeks. The overflow option exists: enable "usage credits" and pay at standard API rates for anything above the monthly allowance. But that is exactly the scenario where per-token cost optimization matters most. Source: FindSkill.ai, "Claude Code Pricing After June 15: The Decision Table" Source: CloudZero, "Claude Code Pricing In 2026: Plans, Token Costs, And What It Actually Costs to Use" GitHub's credit math is similar. 1 AI Credit equals $0.01 USD. A $39 Copilot Pro+ plan includes 3,900 credits. Agentic mode on a frontier model can consume the entire month's allowance in a single afternoon of heavy use. Source: GitHub Docs, "Usage-based billing for individuals" Most coding agent calls do not need a frontier model. The critical insight is not that metered billing is expensive. It is that most of the metered tokens are going to the wrong model. Augment Code published a routing guide in 2026 showing that three-tier Claude routing saves 51% compared to uniform Opus deployment. Their breakdown maps the coding agent workflow to model tiers: Opus 4.6 for coordination decisions that cascade through every downstream agent Sonnet 4.6 for high-volume implementation work (79.6% SWE-bench score at $3/MTok) Haiku 4.5 for the hundreds of file operations that need speed over reasoning depth GPT-5.2 for async code review that benefits from exhaustive investigation Source: Augment Code, "Best AI Model for Coding Agents in 2026: A Routing Guide" Sonnet handles 80% to 90% of coding tasks at the same quality as Opus. The remaining 10% to 20%, complex architecture decisions and multi-file refactors with subtle dependencies, genuinely benefit from Opus-level reasoning. This matches the broader production data. Datadog measured that 69% of all LLM input tokens are system prompts, tool schemas, and policy definitions that repeat on every call. The AICC's analysis of 2.4 billion API calls found that organizations with intelligent routing pay $2.31 per million tokens versus $18.40 without. Source: Datadog, "State of AI Engineering 2026" Source: AICC, "Enterprise Token Costs Drop 67% Year-Over-Year as Multi-Model AI Adoption Hits Record High," May 2026 When every token was free (or felt free under a flat-rate plan), nobody measured this. Now that every token has a price, the waste is visible. The new math: what routing does to your metered bill. Let us run the numbers on a developer spending $200 per month on Claude Code under the new metered system (a Max 20x plan using its full credit pool). Without routing (current default): All agent calls go to Opus 4.8 at $5/$25 per million tokens. Total: $200/month. With three-tier routing: 60% of calls to Haiku at $1/$5: $40/month 30% of calls to Sonnet at $3/$15: $42/month 10% of calls stay on Opus at $5/$25: $20/month Total: $102/month. Savings: $98/month per developer. That $200 Max credit now stretches to cover nearly double the usage. Or the same usage costs half as much. For a 50-developer team, the monthly savings are $4,900. The same math applies to Copilot. A $39 Pro+ plan with 3,900 credits goes further when simple file reads and boilerplate generation route to a cheaper model instead of burning credits at frontier rates. GitHub's credit system charges based on the model's API price, so routing to a cheaper model directly reduces credit consumption per request. For teams on overflow billing (paying API rates above the monthly credit), the savings are even larger. Every request above the credit threshold is pure API spend. Routing those overflow requests captures the full 5x spread between Haiku and Opus on every call. Cursor already figured this out. Cursor moved to usage-based billing in June 2025, a full year before GitHub and Anthropic followed. Their response was architectural: they built Composer 2.5, their own model optimized for agentic coding, and made it the default "Auto" selection. Using Auto provides significantly more included usage than selecting a frontier model manually. Source: Cursor Docs, "Models & Pricing" The logic is clear. When every token is metered, the vendor that routes to the cheapest capable model gives the user the most value per dollar. Cursor built the routing into the product. The question for Claude Code and Copilot users is whether they wait for Anthropic and GitHub to do the same, or route at the API layer now. Cached tokens are the other half of the equation. Every coding agent provider now offers cached token pricing, and the discounts are substantial. Anthropic charges roughly 10% of the standard input rate for cached tokens. OpenAI offers similar discounts on Codex cached inputs. Datadog's finding that 69% of input tokens are repeated system prompts means caching alone can cut input costs by over 60% on those tokens. Combined with routing, the compound savings reach 70% to 80%. Source: Datadog, "State of AI Engineering 2026" Yet only 28% of teams use prompt caching. That number was measured before metered billing. Now that every cached token saves real money against a visible credit balance, expect adoption to accelerate. But caching only reduces the cost of repeated content. Routing reduces the cost of new content. You need both. Five things to do before June 15. 1. Measure your per-session token consumption. Pull your API logs or Claude Code usage data. How many tokens does a typical coding session consume? How much of that is system prompts, tool schemas, and file reads versus actual reasoning? The answer determines how much metered billing will cost you. 2. Classify your agent calls by complexity. Tag a sample of 100 coding agent interactions. File reads, linting, formatting, simple refactors, test generation from templates: these are Haiku-tier tasks. Architecture decisions, complex debugging, multi-file refactors with dependency analysis: these need Opus. Most teams find 60% to 70% of their agent calls are in the cheap tier. 3. Enable prompt caching. If your coding agent sends the same system prompt, tool schemas, or project context on every call, you are paying full price for 69% of your tokens. Anthropic and OpenAI both offer automatic prefix caching. Enable it before metered billing starts. 4. Route per call, not per application. Stop hardcoding model selection. A classifier that evaluates each agent call independently and routes to the cheapest capable model captures the 5x pricing gap between Haiku and Opus on the majority of coding tasks. This is not a workflow change. It is a base URL change. 5. Set overflow alerts. If you enable usage credits (overflow billing) on Claude Code, set a monthly cap. The difference between a $200 plan and a $2,000 bill is one heavy agent session without limits. The market is moving toward routing as default. IDC predicts 70% of top AI enterprises will use dynamic model routing by 2028. The metered billing shift accelerates this timeline. When every token was subsidized, routing was an optimization. Now that every token is metered, routing is cost control. Source: IDC Blog, "Why the Future of AI Lies in Model Routing," November 2025 The enterprise AI cost crisis is well documented. Uber exhausted its 2026 AI budget in four months. Microsoft cancelled Claude Code licenses. An unnamed enterprise ran up a $500 million Anthropic bill in 30 days. The FinOps Foundation reports 98% of teams now manage AI spend, up from 31% two years ago. Source: FinOps Foundation, "State of FinOps 2026 Report" Source: FourWeekMBA, "The Enterprise AI Cost Crisis: $9-19M/Year and Nobody Can Prove the ROI," 2026 Metered billing does not create the cost problem. It makes the cost problem visible. And visibility is the prerequisite for optimization. Where Nadir fits. Nadir sits between your coding agent and the model provider. The trained classifier evaluates each API call in under 10 ms and routes to the cheapest model that can handle it. On 11,420 RouterBench held-out triples, the verifier-gated cascade preserves 98% of always-Opus quality at 60% lower cost. The integration is two lines: change the base URL, set \model="auto"\. Per-request response headers (\x-nadir-routed-to\, \x-nadir-cost-usd\, \x-nadir-cost-saved\) show exactly where each call went and what it saved. Your Claude Code credit pool is $20 or $200. Your Copilot credit balance is $39. Your Codex budget is whatever your team approved. Routing is the lever that makes those credits cover more work. Not by using AI less. By using the right model for each call. Sources: GitHub Blog, "GitHub Copilot is moving to usage-based billing" (June 2026). GitHub Changelog, "Updates to GitHub Copilot billing and plans" (June 2026). The Register, "Angry devs vow to flee GitHub Copilot as metered billing takes hold" (June 2026). TechTimes, "Anthropic Ends Subscription Subsidy for Agents June 15" (June 2026). InfoWorld, "Anthropic puts Claude agents on a meter" (2026). The New Stack, "Anthropic splits billing again: Agent SDK gets separate credit pools" (2026). OpenAI Developers, "Pricing - Codex". ExplainX, "The Claude Token Economy" (2026). Stanford/Microsoft Research, arXiv:2604.22750 (April 2026). Augment Code, "Best AI Model for Coding Agents in 2026: A Routing Guide". Datadog, "State of AI Engineering 2026". AICC, "Enterprise Token Costs Drop 67% Year-Over-Year" (May 2026). Cursor Docs, "Models & Pricing". Verdent Guides, "Codex Pricing in 2026". FindSkill.ai, "Claude Code Pricing After June 15". CloudZero, "Claude Code Pricing In 2026". GitHub Docs, "Usage-based billing for individuals". IDC Blog, "Why the Future of AI Lies in Model Routing" (November 2025). FinOps Foundation, "State of FinOps 2026 Report" (2026). FourWeekMBA, "The Enterprise AI Cost Crisis" (2026). Anthropic, OpenAI, GitHub model pricing as of June 2026.