Uber's AI budget lasted four months. In early 2026, Uber rolled out Claude Code access to its full 5,000-engineer organization. Adoption moved fast. By March, 84% of engineers were classified as agentic coding users, up from 32% at launch in December 2025. Nearly 95% of Uber engineers used AI tools every month. Around 70% of committed code came from those systems. Then the bill arrived. Uber's CTO Praveen Neppalli Naga told The Information that the company had to revisit its financial assumptions after spending exceeded projections much earlier than expected. The entire 2026 AI budget was exhausted by April. Per-developer costs ranged from $150 to $250 per month on average, but heavy users ran $500 to $2,000 per month. No FinOps playbook existed for token-based billing at that scale. Source: The Information, "Uber CTO Shows How Claude Code Can Blow Up AI Budgets," May 2026 Source: Startup Fortune, "Uber Burned Its Entire 2026 AI Budget in Four Months," May 2026 This is not just an Uber problem. The Pragmatic Engineer newsletter has been tracking the same pattern across multiple companies. AI agent spending has surged roughly 10x in six months at some organizations. One company profiled went from $200 per developer per month to $3,000 per developer per month for a seven-person team. Some individual developers are spending $500 a day on Claude Code alone. Source: The Pragmatic Engineer, "The Pulse: AI token spending out of control," 2026 DX's survey of engineering leaders tells the planning side of the story. When asked about their 2025 spending, 38.4% of leaders reported spending $101 to $500 per developer per year on AI tools. Only 10.5% were spending over $1,000. For 2026, many planned to allocate 1 to 3% of their total engineering budgets. The actual numbers blew past those plans. DX now estimates the realistic floor at $500 to $1,000 per developer per year, with multi-vendor setups pushing costs to $3,000 or more. That is 3 to 6x what most leaders budgeted. Source: DX, "How are engineering leaders approaching 2026 AI tooling budgets?" Deloitte's January 2026 report, "The Pivot to Tokenomics," confirmed the trend at the enterprise level. AI has become the single fastest-growing line item in corporate technology budgets, consuming a quarter to one-half of IT spend at some firms. Cloud bills are up 19% year over year, driven almost entirely by generative AI workloads. Source: Deloitte Insights, "AI tokens: How to navigate AI's new spend dynamics," January 2026 The root cause is uniform model selection. The spending explosion has a straightforward explanation. Every call in every agentic coding session hits a frontier model. File reads, status checks, linting, formatting, classification, and simple code generation all go to Opus or GPT-5.5. The model does not distinguish between a request that requires deep reasoning and one that could be handled by a model costing 5 to 10x less. Developer surveys back this up. A study tracking 42 agent runs on a FastAPI codebase found 70% of tokens were waste from reading too many files, failed attempts, and verbose tool output. A separate analysis found 87% of tokens went to finding code, not writing it. In both cases, the wasted tokens were billed at frontier rates. Token cost volatility topped the pain points in a Q1 2026 developer survey at 42%. Monthly bills swing 2 to 3x quarter over quarter because agentic workloads are inherently stochastic. The same task can cost 30x more on one run than another, as Stanford and Microsoft Research documented in their study of coding agent token consumption. The pattern repeats at every scale. Individual developers hit subscription limits and switch to API pricing, where costs balloon further. Teams adopt multiple tools (Claude Code, Cursor, Copilot) and pay for overlapping coverage. Organizations set annual budgets based on 2025 usage patterns and blow through them in Q1 when agentic adoption takes off. The data on what routing changes. The fix is not to use cheaper models for everything. Sonnet 4.6 delivers 99% of Opus 4.6's performance on SWE-bench (79.6% vs 80.8%), but there are tasks where that 1.2 percentage points matters. The fix is to stop sending every request to the most expensive model. Augment Code published a routing guide in 2026 that breaks coding agent workflows into roles: Opus for coordination and architecture decisions, Sonnet for implementation, Haiku for file navigation and classification. Their data shows three-tier Claude routing saves 51% compared to uniform Opus deployment. Source: Augment Code, "Best AI Model for Coding Agents in 2026: A Routing Guide" Enterprise data tells the same story at a larger scale. Between Q1 2025 and Q1 2026, average enterprise cost per million tokens fell from $18.40 to $6.07. Token price cuts explain roughly half. The other half came from multi-model routing, now used by 42% of enterprises. The average number of models per enterprise account grew from 2.1 to 4.7 in that period. Source: Open Source For You, "Enterprise AI Costs Crash 67% As Open Source Models And Multi-Model Routing Go Mainstream," May 2026 Organizations that fully implemented tiered routing architectures achieved median blended costs of $2.31 per million tokens, compared to $18.40 for frontier-only deployments. That is an 87.4% reduction. What the Uber math looks like with routing. Take Uber's numbers. 5,000 engineers, $150 to $250 per developer per month average, $500 to $2,000 for heavy users. Call it $1 million per month total at the midpoint. If 60% of agentic coding calls are low complexity (file reads, simple edits, formatting, classification) and route to Haiku at $1/$5 per million tokens instead of Opus at $5/$25, the blended cost drops by 40 to 50%. That turns $1 million per month into $500,000 to $600,000. The annual difference is $4.8 to $6 million. That is not a theoretical number. It is the same range the enterprise data shows. Teams with tiered routing pay roughly half what teams with uniform model selection pay, and the quality gap on routed requests is within measurement noise for the tasks being routed. The per-developer math is equally direct. A heavy user spending $1,500 per month on uniform Opus usage drops to $750 to $900 with routing. A team of seven spending $3,000 per developer per month drops to $1,500 to $1,800. The budget that was supposed to last a year lasts a year. Three things engineering leaders should do now. 1. Audit where tokens actually go. Most teams do not know the complexity breakdown of their agentic workloads. Instrument a week of production traffic. Classify each call by complexity. The data will almost certainly show that 50 to 70% of calls do not need a frontier model. GitHub found 37% of their agentic tokens were pure waste. Your number is probably similar. 2. Route per call, not per session. The model that plans architecture changes is not the model that should read file listings. A trained classifier that evaluates each API call independently and routes to the cheapest capable model captures the gap between frontier and commodity pricing on the majority of calls. Static rules based on prompt length or keyword matching drift as workloads change. A classifier trained on observed outcomes adapts. 3. Set per-developer budgets with routing in the loop. DX's survey shows most engineering leaders are still planning AI budgets based on subscription sticker prices. That does not work when heavy users run $500 to $2,000 per month in token costs. Budget at the per-developer level, and include routing savings in the forecast. A realistic 2026 budget with routing is $1,000 to $1,500 per developer per year. Without routing, plan for $3,000 or more. Where Nadir fits. Nadir routes each API call through a trained classifier in under 10 ms. For coding agent workloads where the majority of calls are low to mid complexity, this means most of your traffic hits Haiku or Sonnet pricing instead of Opus pricing. The integration is two lines: change the base URL, set model="auto". Per-request response headers (x-nadir-routed-to, x-nadir-cost-saved, x-nadir-cost-usd) show exactly where each call was routed and what it saved. The dashboard aggregates savings by day, week, and month. No instrumentation needed beyond the two-line change. For teams staring at AI budgets that are tracking 3 to 6x above plan, routing is the most direct lever that does not require changing tools, rewriting prompts, or downgrading model quality on the requests that actually need it. Sources: The Information, "Uber CTO Shows How Claude Code Can Blow Up AI Budgets" (May 2026). Startup Fortune, "Uber Burned Its Entire 2026 AI Budget in Four Months" (May 2026). The Pragmatic Engineer, "The Pulse: AI token spending out of control" (2026). DX, "How are engineering leaders approaching 2026 AI tooling budgets?". Deloitte Insights, "AI tokens: How to navigate AI's new spend dynamics" (January 2026). Augment Code, "Best AI Model for Coding Agents in 2026: A Routing Guide". Open Source For You, "Enterprise AI Costs Crash 67%" (May 2026). Stanford/Microsoft Research, "How Do AI Agents Spend Your Money?" arXiv:2604.22750 (April 2026). Anthropic, OpenAI model pricing as of May 2026.