Microsoft is pulling Claude Code. The reason is the bill. In mid-May 2026, Microsoft began cancelling most internal Claude Code licenses across its Experiences and Devices division. The directive covers thousands of engineers working on Windows, Microsoft 365, Teams, Outlook, and Surface. Access ends June 30, 2026. Developers are being redirected to GitHub Copilot CLI. The pilot launched in December 2025. By May, adoption had spread well beyond engineering to product managers and designers. The tool was popular. It was also expensive. Microsoft's financial year ends June 30, and the Claude Code line item was hard to defend at enterprise scale. Microsoft framed the move as platform standardization. The financial pressure tells the rest of the story. Source: Windows Central, "Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI," May 2026 Source: Crypto Briefing, "Microsoft cancels Claude Code licenses as AI costs surge across the industry," May 2026 Microsoft is not alone. This is an industry pattern. Uber burned through its entire 2026 AI budget by April. Claude Code adoption jumped from 32% to 84% across a 5,000-engineer organization in three months. Per-developer costs ran $500 to $2,000 per month for heavy users. The CTO had to revisit financial assumptions after spending exceeded projections by a wide margin. Source: The Information, "Uber CTO Shows How Claude Code Can Blow Up AI Budgets," May 2026 At Amazon, the internal culture went the opposite direction. Employees were encouraged to maximize token consumption in a practice they called "tokenmaxxing." The bet was that heavy AI usage would drive enough productivity to justify the cost. The bill is still being tallied. At Meta, an employee built an internal leaderboard called "Claudeonomics" that ranked the company's roughly 85,000 workers by token consumption. In a 30-day window, total usage on the dashboard exceeded 60 trillion tokens. Source: Tom's Hardware, "AI cost crisis hits tech giants as employee tokenmaxxing backfires," May 2026 Source: Fortune, "Microsoft reports are exposing AI's real cost problem," May 2026 The numbers across the industry tell the same story. The Pragmatic Engineer newsletter tracked AI agent spending surging roughly 10x in six months at some organizations. Individual developers report monthly bills between $500 and $2,000. One company profiled went from $200 per developer per month to $3,000 for a seven-person team. DX surveyed engineering leaders on their 2026 AI budgets. When asked about 2025 spending, 38.4% of leaders reported $101 to $500 per developer per year. For 2026, most planned to allocate 1% to 3% of their engineering budget to AI tools. Actual costs are running $3,000 or more per developer per year. That is 3x to 6x what most leaders budgeted. Source: DX, "How are engineering leaders approaching 2026 AI tooling budgets?" Gartner's May 2026 report estimates the enterprise AI coding agent market at $9.8 billion to $11 billion annualized. AI agent cost overruns are a top concern among IT executives, with most attributing the overruns to increased usage under consumption-based pricing. Source: Gartner, "The market for enterprise AI coding agents is entering a new phase," May 2026 Software vendors are also adding their own pressure. AI-driven renewals are raising enterprise software prices 20% to 37% through forced SKU migrations and credit-based pricing. The "AI tax" is compounding on top of the token bill. Source: Tropic, "The AI Tax: How AI Is Driving Software Price Increases," 2026 Why the bill grows faster than the budget. The root cause is straightforward. Every call in every agentic coding session hits a frontier model. File reads, status checks, linting, formatting, classification, and simple code generation all go to Opus or GPT-5 at $5 to $25 per million tokens. The model does not distinguish between a request that requires deep reasoning and one that Haiku could handle at $1 per million tokens. Agentic workflows consume 1,000x more tokens than chat. Stanford and Microsoft Research documented this in their study of eight frontier LLMs on SWE-bench. Runs on the same task vary by 30x. Spending more tokens does not improve accuracy. 40% to 60% of input tokens are removable waste. The math is counterintuitive but consistent. Per-token prices have fallen 99.7% since 2024. Enterprise AI spend tripled to $37 billion anyway. Cheaper tokens made agentic workflows viable. Agentic workflows consume 5x to 30x more tokens per task than the chatbots they replaced. Volume growth outpaced price cuts. Goldman Sachs projects this will accelerate. Their forecast calls for a 24x increase in token consumption by 2030, reaching 120 quadrillion tokens per month. If consumption grows faster than unit costs fall, the bill gets worse, not better. Source: Goldman Sachs, "AI Agents Forecast to Boost Tech Cash Flow as Usage Soars," May 2026 Source: Stanford/Microsoft Research, "How Do AI Agents Spend Your Money?", arXiv:2604.22750, April 2026 The fix is not to use AI less. It is to route each call. Microsoft's answer was to cancel licenses and consolidate on a cheaper tool. That is one approach. It trades capability for cost control. It does not address the underlying problem: uniform model selection. The underlying problem is that every API call, regardless of complexity, goes to the most expensive model. File reads go to Opus. Status checks go to Opus. Simple code generation goes to Opus. The 60% to 70% of calls that do not need a frontier model get billed at frontier rates anyway. Enterprise data shows what happens when you fix this. Between Q1 2025 and Q1 2026, organizations that implemented tiered routing achieved median blended costs of $2.31 per million tokens, compared to $18.40 for frontier-only deployments. That is an 87% reduction. Multi-model routing is now used by 42% of enterprises, up from single digits a year ago. Source: AICC, "Enterprise Token Costs Drop 67% Year-Over-Year," May 2026 Augment Code published routing data specific to coding agent workflows. Their analysis shows three-tier Claude routing (Opus for architecture decisions, Sonnet for implementation, Haiku for file navigation and classification) saves 51% compared to uniform Opus deployment. Source: Augment Code, "Best AI Model for Coding Agents in 2026: A Routing Guide" What the Microsoft math looks like with routing. Microsoft's Experiences and Devices division has thousands of engineers. Assume 3,000 active Claude Code users at an average of $300 per month (conservative, given industry data showing $500 to $2,000 for heavy users). That is $900,000 per month, or $10.8 million per year. If 60% of agentic coding calls are low complexity and route to Haiku at $1/$5 per million tokens instead of Opus at $5/$25, the blended cost drops 40% to 50%. That turns $900,000 per month into $450,000 to $540,000. The annual savings are $4.3 to $5.4 million. Microsoft chose to cancel licenses instead. The engineers still need AI coding tools. They will use Copilot CLI, which also runs on token-based pricing under the hood. The cost problem did not go away. It moved. What engineering leaders should do instead. 1. Audit the complexity breakdown. Most teams do not know what percentage of their agentic calls actually need a frontier model. Instrument a week of traffic. Classify each call by complexity. GitHub found 37% of their agentic tokens were pure waste. Independent analyses show 60% to 70% of calls do not need Opus or GPT-5. 2. Route per call, not per tool. The model that plans architecture changes is not the model that should read file listings. A trained classifier that evaluates each API call independently and routes to the cheapest capable model captures the pricing gap on the majority of calls. Augment Code's data shows 51% savings from three-tier routing alone. 3. Budget with routing in the forecast. DX's survey shows engineering leaders planned for $500 to $1,000 per developer per year. Actual costs are running $3,000 or more. With routing, a realistic budget is $1,000 to $1,500 per developer per year. Without routing, plan for $3,000 or more and expect to blow through it. 4. Do not cancel the tool. Fix the model selection. Microsoft's approach trades capability for cost control. Routing preserves the capability (frontier models for tasks that need them) while cutting cost on the 60% to 70% of calls that do not. Where Nadir fits. Nadir routes each API call through a trained classifier in under 10 ms. Haiku for file reads and classification. Sonnet for implementation. Opus only when the prompt actually needs deep reasoning. The integration is two lines: change the base URL, set model="auto". Per-request response headers (x-nadir-routed-to, x-nadir-cost-saved, x-nadir-cost-usd) show exactly where each call went and what it saved. The dashboard aggregates savings by day, week, and month. No instrumentation beyond the two-line change. For teams watching their AI coding budget evaporate, routing is the most direct lever. It does not require changing tools, rewriting prompts, or downgrading model quality on the requests that actually need it. The requests that do not need a frontier model stop being billed at frontier rates. Sources: Windows Central, "Microsoft cancels Claude Code licenses" (May 2026). Crypto Briefing, "Microsoft cancels Claude Code licenses as AI costs surge" (May 2026). Tom's Hardware, "AI cost crisis hits tech giants as employee tokenmaxxing backfires" (May 2026). Fortune, "Microsoft reports are exposing AI's real cost problem" (May 2026). The Information, "Uber CTO Shows How Claude Code Can Blow Up AI Budgets" (May 2026). DX, "How are engineering leaders approaching 2026 AI tooling budgets?". Gartner, "Enterprise AI coding agents market" (May 2026). Goldman Sachs, "AI Agents Forecast" (May 2026). Stanford/Microsoft Research, arXiv:2604.22750 (April 2026). AICC, "Enterprise Token Costs Drop 67%" (May 2026). Augment Code, "AI Model Routing Guide". Tropic, "The AI Tax" (2026).