Enterprises adopted multi-model AI. They forgot the routing layer. F5's 2026 State of Application Strategy Report surveyed 1,800 organizations across every major industry vertical. The headline finding: 78% of enterprises now run AI inference as a core production operation. Not experiments. Not pilots. Production workloads handling real traffic. The average organization operates seven AI models in production or active evaluation. For 77% of respondents, inference (not training or fine-tuning) is the dominant AI activity. Only 8% rely exclusively on a single public AI service. Source: F5, "AI Has Left the Lab: F5 Report Reveals 78% of Enterprises Now Run AI Inference as a Core Operation," May 2026 Source: Help Net Security, "Multi-model AI is creating a routing headache for enterprises," May 2026 Seven models, one routing strategy: send everything to the most expensive one. Having seven models available does not mean seven models are being used well. Most enterprises adopted new models opportunistically. A team evaluated GPT-5, another team integrated Claude, a third team picked up an open-source model for a batch pipeline. Each team hardcoded their model choice. The result is a multi-model portfolio with single-model routing. Every request within a given application still goes to whichever model the original developer chose. There is no per-request evaluation of whether a cheaper model could handle the task. F5's data confirms this gap. While 52% of organizations chain or orchestrate multiple AI models together, the chaining is mostly sequential (model A processes, then model B refines). It is not cost-aware routing where each request is independently evaluated and sent to the cheapest capable model. The report describes the operational challenge directly: enterprises are expanding traffic management, identity controls, observability, and routing systems for multiple AI models across hybrid environments. But expanding infrastructure to support multiple models is not the same as routing intelligently between them. Source: F5, "F5 Report 2026: AI inferencing has arrived, complicating an already complex IT landscape," May 2026 The cost of the gap. The numbers from the AICC's analysis of 2.4 billion API calls show what happens on both sides of this divide. Enterprises that adopted intelligent multi-model routing achieved a median 71% cost reduction compared to single-provider deployments. The top quartile hit reductions exceeding 80%. The effective blended cost per million tokens dropped from $18.40 to $6.07 across the dataset, a 67% year-over-year decline. But routing optimization accounted for an estimated 34 percentage points of that 67% drop. The rest came from model price cuts (DeepSeek V4, Gemini Flash, open-source competition). Meaning: enterprises that got cheaper models but did not route to them captured roughly half the available savings. The other half required a routing decision at the request level. Open-source and open-weight models captured 38% of enterprise token volume in Q1 2026, up from 11% in Q1 2025. That is a 245% share increase in twelve months. But having cheap models in your portfolio only helps if your application actually sends requests to them when they are the right fit. Source: AICC, "Enterprise Token Costs Drop 67% Year-Over-Year as Multi-Model AI Adoption Hits Record High," May 2026 IDC says routing is where this is going. IDC's AI and Automation FutureScape predicts that by 2028, 70% of top AI-driven enterprises will use advanced multi-tool architectures to dynamically and autonomously manage model routing across diverse models. That prediction implies that today, in mid-2026, significantly fewer than 70% have this in place. The gap between multi-model adoption (widespread) and dynamic model routing (early) is where the excess spend lives. IDC frames the value of routing as threefold: performance optimization (selecting the most context-appropriate model per request), cost reduction (routing commodity tasks to commodity models), and insulation from technology churn (swapping models without rewriting applications). The third point matters more than teams realize. Model pricing changes every quarter. New models launch monthly. A routing layer absorbs these shifts. A hardcoded model choice does not. Source: IDC Blog, "Why the Future of AI Lies in Model Routing," November 2025 Deloitte calls it "the AI infrastructure reckoning." Deloitte's 2026 Tech Trends report dedicated an entire section to what they call the AI infrastructure reckoning. The core argument: inference costs dropped 280-fold over two years, but enterprise AI spending kept climbing because cheaper tokens made agentic workflows viable, and agentic workflows consume 5 to 30x more tokens per task. The math is counterintuitive but consistent. Per-token prices fell 99.7%. Total inference spend tripled to $37 billion. The culprit is volume. When tokens get cheap enough, teams deploy agents, chains, and orchestrated workflows that consume orders of magnitude more tokens than the chatbots they replaced. Deloitte's prescription includes a three-tier hybrid model: public cloud for elastic training, private infrastructure for predictable high-volume inference, and edge computing for latency-critical decisions. But underneath the infrastructure layer, the routing question remains. Which model handles which request? The infrastructure tier does not answer that. The routing layer does. Source: Deloitte, "The AI infrastructure reckoning: Optimizing compute strategy in the age of inference economics," Tech Trends 2026 What a routing layer needs to do in a 7-model world. When you operate one or two models, you can get away with if-else routing. Model A for one use case, Model B for another. The routing logic fits in 20 lines. At seven models, that breaks down. The combinatorics of model capabilities, pricing tiers, latency profiles, and task types exceed what static rules can handle. F5's report noted that 93% of surveyed organizations operate in hybrid multicloud environments, with 86% distributing applications across on-premises, public cloud, and colocation. Every inference request becomes a routing decision weighed against cost, accuracy, availability, latency, and geographic constraints. A production routing layer at this scale needs four things: 1. Per-request classification. Each API call is evaluated independently. A trained classifier that reads the prompt and outputs a complexity tier in under 10 ms. Not prompt length. Not keyword matching. Semantic complexity. 2. Cost-aware model selection. Given the complexity tier, the router selects the cheapest model that meets the quality threshold. This requires a live pricing table and performance data per model per tier. 3. Automatic failover. When a provider goes down (and they all do), the router retries against the next model in the chain. The application never sees the failure. 4. Per-request observability. Every routing decision is logged with the model selected, the cost incurred, the cost saved versus the default model, and the latency added. Without this, you cannot validate that routing is actually working. Static rules cannot adapt when a new model launches with better price-performance on mid-tier tasks. A trained classifier, retrained on observed outcomes, can. The gap is closing, but slowly. The trajectory is clear. Multi-model adoption happened in 2025. Multi-model routing is happening in 2026. By 2028, IDC expects it to be standard at top AI enterprises. The question for engineering teams running seven models today is whether they wait for routing to become standard or capture the savings now. The AICC data says the difference is 34 percentage points of cost reduction. On a $50,000 per month inference bill, that is $17,000 per month, or $204,000 per year. Where Nadir fits. Nadir is a routing layer purpose-built for this problem. A trained classifier evaluates each API call in under 10 ms and routes to the cheapest model that can handle it. Integration is two lines: change the base URL, set model="auto". Per-request response headers (x-nadir-routed-to, x-nadir-cost-saved, x-nadir-cost-usd, x-nadir-latency-ms) give the observability that F5's report identifies as a gap. The dashboard aggregates savings by day, week, and month. No instrumentation beyond the two-line change. For teams that already have multiple models available but route by developer preference instead of task complexity, Nadir turns a multi-model portfolio into a multi-model routing strategy. The models you already pay for start earning their keep. Sources: F5, "AI Has Left the Lab: 78% of Enterprises Now Run AI Inference as a Core Operation" (May 2026). F5, "F5 Report 2026: AI inferencing has arrived" (May 2026). Help Net Security, "Multi-model AI is creating a routing headache for enterprises" (May 2026). AICC, "Enterprise Token Costs Drop 67% Year-Over-Year" (May 2026). IDC Blog, "Why the Future of AI Lies in Model Routing" (November 2025). Deloitte, "The AI infrastructure reckoning" (2026).