In May 2026, three deals rewrote the AI infrastructure map. On May 26, OpenRouter announced a $113 million Series B led by CapitalG, Alphabet's independent growth fund. The round valued the company at $1.3 billion. NVentures (NVIDIA's venture arm), ServiceNow Ventures, MongoDB Ventures, Snowflake Ventures, and Databricks Ventures participated alongside existing investors Andreessen Horowitz and Menlo Ventures. Four weeks earlier, on April 30, Palo Alto Networks announced its intent to acquire Portkey, the production-grade AI gateway that had been the most visible enterprise alternative to OpenRouter. Reports suggest a valuation between $120 million and $140 million, double what Portkey was worth after its $15 million Series A in February 2026. In between, DeepInfra closed a $107 million Series B for inference infrastructure, with backing from 500 Global, Nvidia, Samsung Next, and Felicis. And in the background, Martian, the San Francisco startup that bills itself as the inventor of the first LLM router, is reportedly nearing a $1.3 billion valuation of its own. Source: BusinessWire, "OpenRouter Raises $113 Million CapitalG-led Series B as Weekly Volume Explodes to 25T Tokens," May 2026 Source: Palo Alto Networks, "Palo Alto Networks to Acquire Portkey to Secure the Rise of AI Agents," April 2026 Source: Let's Data Science, "DeepInfra Raises $107M to Scale Inference Infrastructure," 2026 Source: Medium, "Martian, the San Francisco-based startup that invented the first LLM router, is reportedly nearing a $1.3B valuation," April 2026 Add it up. Over $250 million in funding, one acquisition by a $100B+ security company, and two separate $1.3 billion valuations. All in 30 days. All for companies that sit between your application and the LLM provider. The inference routing layer just became mandatory infrastructure. The market is real. The numbers confirm it. Market.us projects the global AI inference gateways market will grow from $1.87 billion in 2024 to $25.78 billion by 2034, a 30% compound annual growth rate. IDC predicts that by 2028, 70% of top AI-driven enterprises will use dynamic model routing to manage workloads across diverse models. Source: Market.us, "AI Inference Gateways Market Size, CAGR of 30%" Source: IDC Blog, "Why the Future of AI Lies in Model Routing," November 2025 The demand signal is not speculative. OpenRouter processes 25 trillion tokens per week, up 5x from 5 trillion six months ago. That is 100 trillion tokens per month flowing through a single routing layer. The company hit $50 million in annualized revenue in early 2026, up from $10 million in October 2025. Five hundred percent revenue growth in five months. Source: TechCrunch, "OpenRouter more than doubles valuation to $1.3B in a year," May 2026 Palo Alto Networks did not acquire Portkey because AI gateways are a nice feature. They acquired it because every enterprise deploying AI agents needs a control plane between the application and the model provider, and that control plane is becoming a security surface. Portkey gets folded into Prisma AIRS, Palo Alto's AI security platform. The message: routing is not just cost infrastructure. It is security infrastructure. Source: Cybersecurity Magazine, "Securing AI: Behind Palo Alto Networks' Portkey Acquisition," 2026 What the money is actually buying. Not all routing is the same. The $250 million landed on three distinct architectures, and the differences matter for your bill. Traffic management (OpenRouter, Portkey). These platforms sit between your application and model providers. They offer a unified API, provider failover, rate limit handling, and usage tracking. OpenRouter supports 400+ models across dozens of providers. When a provider goes down or hits rate limits, traffic reroutes automatically. This is valuable, and for many teams it is the first layer of routing they adopt. But traffic management does not decide which model should handle which request. OpenRouter's Auto Router, powered by Not Diamond, analyzes prompts and selects from a pool of 33 models. The optimization target is output quality, not cost. If the router picks Claude Opus for a prompt that Haiku could handle, you pay 5x more for an equivalent result. Source: OpenRouter Docs, "Auto Router - Intelligent Model Selection" Inference infrastructure (DeepInfra). DeepInfra builds the compute layer that runs models. Their $107 million goes toward GPU clusters, inference optimization, and serving open-source models at competitive prices. This is the supply side of the market: making inference cheaper and faster at the hardware level. Model routing (Martian, Not Diamond, Nadir). This is the decision layer. A classifier reads each prompt, evaluates its complexity, and routes to the cheapest model that can handle it. The optimization target is cost-quality tradeoff, not raw quality maximization. The goal is to pay Haiku prices for Haiku-level tasks and reserve Opus for tasks that genuinely need it. The distinction matters because the first two categories add value without touching model selection. They make it easier to access models and run them cheaper at the hardware level. The third category changes which model handles each request, and that is where the 50% to 80% cost reductions come from. The gap between routing tokens and routing decisions. OpenRouter processes 100 trillion tokens per month. That volume is impressive, but volume alone does not optimize cost. Most of those tokens flow through a model the developer hardcoded or the Auto Router selected for quality, not cost efficiency. OpenRouter charges a 5.5% platform fee on pay-as-you-go usage. Their business model scales with token volume. More tokens, more revenue. Cost optimization for the customer, routing fewer tokens to expensive models, works against this incentive. This is not a criticism. It is the economics of a gateway business. The gateway wants more traffic. The cost optimizer wants less expensive traffic. Source: OpenRouter Pricing The same gap shows up in the routing quality data. On the RouterArena academic benchmark, Not Diamond (which powers OpenRouter's Auto Router) ranks #12 because it frequently selects expensive models. Quality-optimized routing and cost-optimized routing are different objectives. You can optimize for both, but you have to design for cost-quality tradeoff explicitly. Source: Artifilog, "Best AI Model Routers in 2026: Honest Rankings That Cut Through the Hype" The Mindcast AI analysis frames this as "loyal versus mercenary" architecture. A loyal router defaults to the provider's most expensive model and treats cost savings as a secondary benefit. A mercenary router optimizes for the user's cost-quality tradeoff first and treats provider loyalty as a secondary concern. Source: Mindcast AI, "The Inference Control Layer: Capability Detection, the Routing Tax, Inference Arbitrage" What intelligent routing actually saves. The pricing spread between model tiers as of June 2026 makes the case: | Model | Input ($/M tokens) | Output ($/M tokens) | |---|---:|---:| | Claude Opus 4.8 | $5.00 | $25.00 | | GPT-5.5 | $5.00 | $30.00 | | Claude Sonnet 4.6 | $3.00 | $15.00 | | DeepSeek V4 | $1.74 | $3.48 | | Claude Haiku 4.5 | $1.00 | $5.00 | | Gemini 2.5 Flash | $0.30 | $2.50 | The output price spread between Haiku and Opus is 5x. Between Gemini Flash and GPT-5.5, it is 12x. For every request that a gateway routes to Opus when Haiku would suffice, you pay 5x more than necessary. The AICC's analysis of 2.4 billion API calls found that organizations with intelligent multi-model routing achieved median blended costs of $2.31 per million tokens, compared to $18.40 for frontier-only deployments. That is an 87% reduction. The savings come not from routing more tokens through a gateway, but from routing each token to the right model. Source: AICC, "Enterprise Token Costs Drop 67% Year-Over-Year as Multi-Model AI Adoption Hits Record High," May 2026 The verification gap. Every router in the funded category, OpenRouter's Auto Router, Martian, Not Diamond, uses the same architecture: read the prompt, predict the model, ship the answer. When the prediction is wrong, the user gets a bad response and the router never finds out. This is the architectural blind spot we wrote about in our verifier-gated cascade post. A prompt-only classifier maxes out at 96 to 97% accuracy on RouterBench-class evaluations. The ceiling exists because predicting output quality from input alone is a fundamentally harder problem than evaluating output quality after the fact. A verifier-gated cascade changes the architecture. The cheap model answers first. A calibrated verifier (AUROC 0.961, ECE 0.016 on RouterBench held-out data) scores the answer before shipping. If the answer passes, you saved 5x. If it fails, you escalate to the next tier. The verifier adds 180 ms on the borderline path, but most requests skip it because the pre-classifier is confident. The result on 11,420 held-out RouterBench triples: 60% cost reduction with 98% of always-Opus quality preserved. The prompt-only classifier at matched cost preserves 96.6%. The 1.7 percentage point difference is the verification gap. At scale, the gap compounds. On a $100,000 monthly inference bill, the verification step prevents roughly $1,700 per month in quality-degrading misroutes that a predict-and-ship router would silently pass through. Over a year, that is $20,400 in quality-adjusted savings, on top of the $60,000 in direct cost reduction. What this means for engineering teams. The VC signal is unambiguous. The routing layer between your application and the LLM provider is becoming standard infrastructure, on par with API gateways, CDNs, and load balancers. The market is projected at $25.78 billion by 2034. Two companies are valued at $1.3 billion each. A $100 billion security company just acquired a third. If you are still hardcoding model selection, you are on the wrong side of the market. Here is the decision framework: If you need provider failover and a unified API: A gateway like OpenRouter solves this well. Unified endpoint, 400+ models, automatic failover. Start here if you have no routing layer at all. If you need cost optimization on a mixed-complexity workload: A gateway alone will not cut it. You need a decision layer that evaluates each request independently and routes to the cheapest model that can handle it. The 5x to 12x price spread between model tiers means intelligent routing captures 50% to 80% savings on most production workloads. If you need both cost optimization and quality guarantees: The verification step is the difference between "we predicted this prompt was easy" and "we predicted this prompt was easy and the cheap model produced a passing answer." For production workloads where quality degradation has a business cost, the verification architecture pays for itself. Where Nadir fits. Nadir is built for the third category. The trained classifier evaluates each API call in under 10 ms and routes to the cheapest model that can handle it. The verifier-gated cascade checks borderline answers before shipping. On RouterBench held-out data, 60% cost reduction with 98% of always-Opus quality preserved. The integration is two lines: change the base URL, set \model="auto"\. Per-request response headers (\x-nadir-routed-to\, \x-nadir-cost-usd\, \x-nadir-cost-saved\) show exactly where each call went and what it saved. OpenRouter validated the market at $1.3 billion. Palo Alto validated the enterprise need. The capital markets just declared the routing layer mandatory. The remaining question is whether your routing layer manages traffic or optimizes decisions. The savings come from the decision. Sources: BusinessWire, "OpenRouter Raises $113 Million CapitalG-led Series B" (May 2026). TechCrunch, "OpenRouter more than doubles valuation to $1.3B in a year" (May 2026). Palo Alto Networks, "Acquire Portkey to Secure the Rise of AI Agents" (April 2026). Cybersecurity Magazine, "Securing AI: Behind Palo Alto Networks' Portkey Acquisition" (2026). Let's Data Science, "DeepInfra Raises $107M" (2026). Market.us, "AI Inference Gateways Market". IDC, "Why the Future of AI Lies in Model Routing" (November 2025). AICC, "Enterprise Token Costs Drop 67% Year-Over-Year" (May 2026). OpenRouter Pricing. Mindcast AI, "The Inference Control Layer". Artifilog, "Best AI Model Routers in 2026". Anthropic, OpenAI, Google, DeepSeek model pricing as of June 2026.