The subscription-to-metered shift happened in 30 days. In April and May 2026, three things happened within weeks of each other: GitHub announced that Copilot is moving to usage-based "AI Credits" billing on June 1. Anthropic split Claude Code and agent usage into a separate credit pool billed at full API rates, effective June 15. Cursor shipped Composer 2.5, an in-house model that costs 1/10th of frontier APIs, because even they could not stomach the per-token math. Each story looks different on the surface. Underneath, they all point to the same structural shift: flat-rate AI access is ending. Every token is getting a price tag. And the teams that route intelligently will pay a fraction of what everyone else pays. What happened at GitHub GitHub Copilot launched in 2022 as a $10/month subscription. Use it as much as you want. That deal is expiring. Starting June 1, 2026, Copilot moves to usage-based billing. Each plan gets monthly "AI Credits" equal to its price: $10 for Pro, $19 for Business, $39 for Enterprise. Usage gets calculated based on actual token consumption at published per-model API rates. Code completions and Next Edit Suggestions stay free. Everything else, chat, CLI, agents, and Spaces, consumes credits. The developer reaction was not enthusiastic. Visual Studio Magazine's coverage captured the sentiment: you pay the same price, but you get less. The calculus changed because of agents. A code completion is a few hundred tokens. An agentic coding session can burn tens of thousands. GitHub cannot offer unlimited agentic compute at $10/month and stay profitable. So they metered it. Source: GitHub Blog, "GitHub Copilot is moving to usage-based billing" What happened at Anthropic On May 14, 2026, Anthropic announced that all programmatic and agent usage of Claude will be separated from the standard subscription pool. This includes the Agent SDK, claude -p, GitHub Actions, and third-party integrations like Zed and OpenClaw. Starting June 15, agent usage draws from a dedicated "Agent SDK Credit Pool" billed at full API rates. The credit caps: $20/month on Pro, $100/month on Max 5x, $200/month on Max 20x. No rollover. The backstory is instructive. Some Max subscribers were running $1,000 to $5,000 worth of agent compute per month on a $200 subscription. That is a 5x to 25x arbitrage. Anthropic closed it. For teams building with Claude's API directly, this changes nothing. They were already paying per token. But it signals something important: even the provider with the most popular coding agent decided that unlimited agentic access is unsustainable at flat-rate pricing. Source: Axios, "Anthropic tightens Claude limits as OpenAI courts agent users," May 14, 2026 Source: InfoWorld, "Anthropic puts Claude agents on a meter," May 2026 What happened at Cursor Cursor took a different approach to the same problem. Instead of metering access to frontier models, they built their own. On May 18, 2026, Cursor released Composer 2.5, an in-house coding model built on Moonshot's open-source Kimi K2.5 checkpoint. It costs $0.50/$2.50 per million tokens (input/output), compared to $5/$25 for Claude Opus 4.7. That is a 10x price difference. On coding benchmarks, Composer 2.5 scores within a few points of Opus 4.7 and GPT-5.5. It hits 79.8% on SWE-Bench Multilingual. A complex refactoring session costs $2 to $5 on Composer 2.5 versus $20 to $50 on Opus 4.7. This is vertical integration driven by unit economics. Cursor could not keep paying frontier API prices for every coding interaction. So they trained a model that handles most coding tasks at a fraction of the cost. Source: Cursor Blog, "Introducing Composer 2.5," May 18, 2026 The common thread These three stories share a single insight: AI at flat-rate pricing does not scale when agents are involved. A chatbot interaction is a few thousand tokens. An agentic coding session can be 50,000 to 500,000 tokens. When users shift from chat to agents, consumption can jump 50x while the subscription price stays the same. No business model survives that math. The industry response is playing out in three variants: | Strategy | Who | How it works | |----------|-----|-------------| | Meter everything | GitHub Copilot | Track tokens, charge per consumption, let users manage their budget | | Segment billing | Anthropic | Keep chat unlimited, cap agent usage at API rates | | Build cheaper models | Cursor | Train an in-house model at 1/10th the cost so margins work at scale | All three approaches share one assumption: every token has a cost, and that cost must be passed through or optimized away. What this means for teams building with LLM APIs If you are building AI features, products, or internal tools using LLM APIs, the shift to metered billing does not affect you directly. You were already paying per token. But the industry-wide move validates something important: the per-token cost structure is permanent, and it is expanding to every surface. The question is not whether tokens will be metered. They already are. The question is how you respond. There are really only three strategies: Use cheaper models for everything. This works until you hit a task that requires frontier reasoning. Then quality drops and users notice. Cheap models do not throw errors on hard prompts. They return plausible-looking responses that are subtly wrong. You do not notice until a customer reports a bug or someone manually reviews the output. Use the best model for everything. This works until your bill scales linearly with usage and someone in finance asks why you are paying $25 per million output tokens to summarize emails. Route each request to the cheapest model that can handle it. Simple tasks go to Haiku or Flash. Mid-complexity tasks go to Sonnet. Only the genuinely hard problems go to Opus or GPT-5.5. Your bill reflects the actual complexity of your workload, not the price of your most expensive model. The third strategy is model routing. And the metered-billing era makes it more valuable, not less. The routing math in a metered world A typical mixed workload breaks down like this: | Complexity tier | Share of requests | Example tasks | |----------------|-------------------|---------------| | Simple | 40-50% | Summarization, formatting, classification, extraction | | Mid-complexity | 30-35% | Multi-step reasoning, code generation, structured analysis | | Complex | 15-25% | Novel problem-solving, multi-file refactoring, research synthesis | If you send everything to Claude Opus 4.7 at $5/$25 per million tokens, you pay frontier prices for tasks that Haiku handles at $1/$5 per million tokens. Route the simple tier to Haiku and the mid tier to Sonnet, and you cut your bill by 40 to 55% with no measurable quality loss on the routed requests. At scale, the difference between routing and not routing can be thousands of dollars per month. This is not theoretical. On 11,420 held-out RouterBench triples, Nadir's verifier-gated cascade preserves 98% of always-Opus quality at 60% lower cost. The cheap model answers first; a calibrated verifier (AUROC 0.961) scores it before we ship; on rejection we escalate. Why this matters more for agentic workloads Single-request savings are meaningful. Agentic savings are transformative. An agentic session with 30 turns re-sends the full context on every turn. Input tokens accumulate because each turn pays for all previous context. A session starting at 2,000 input tokens per turn can reach 30,000 by turn 30. Total input across 30 turns: roughly 480,000 tokens. Total output: roughly 150,000, assuming 5,000 per turn. At Opus 4.7 pricing, that single session costs $6.15. Route even half of those turns to a cheaper model and the cost drops to $3 to $4. Across hundreds of sessions per day, the savings compound into real money. This is exactly why GitHub metered Copilot and Anthropic capped Agent SDK credits. The per-session cost of agentic compute at frontier pricing is unsustainable at flat rates. And for teams paying API rates directly, routing is the primary lever to keep that cost under control. What comes next The metered-billing trend will accelerate. OpenAI's API has always been usage-based. Anthropic and GitHub just joined for agentic usage. Google will follow. By the end of 2026, every major AI provider will charge per token for agent-class workloads. This is not a temporary market correction. Agentic workflows consume orders of magnitude more tokens than chat. Providers cannot subsidize that gap with subscription revenue. The math does not work. For teams building on LLM APIs, the implication is straightforward: your AI bill is now directly proportional to your token consumption. Routing is the most direct lever to reduce that consumption without reducing capability. The flat-rate era gave teams the luxury of not thinking about cost per request. That luxury is gone. Sources GitHub Blog: GitHub Copilot is moving to usage-based billing Visual Studio Magazine: Devs Sound Off on Usage-Based Copilot Pricing Axios: Anthropic tightens Claude limits as OpenAI courts agent users, May 14, 2026 InfoWorld: Anthropic puts Claude agents on a meter Cursor Blog: Introducing Composer 2.5, May 18, 2026 The Decoder: Cursor's Composer 2.5 matches frontier benchmarks at a fraction of the cost The Register: Microsoft's GitHub shifts to metered AI billing