The frontier price gap just blew open. In May 2026, DeepSeek released V4. It scores within 2 to 3 points of Claude Opus 4.7 and GPT-5.5 on major benchmarks (MMLU-Pro, HumanEval, MATH-500). It costs a fraction of what they charge. Here is the current pricing for the three frontier-class models: | Model | Input ($/M tokens) | Output ($/M tokens) | Cached input ($/M tokens) | |-------|---------------------|----------------------|---------------------------| | Claude Opus 4.7 | $5.00 | $25.00 | $2.50 | | GPT-5.5 | $5.00 | $30.00 | $2.50 | | DeepSeek V4 | $1.74 | $3.48 | $0.435 | The input price gap is 3x. The output price gap is 7 to 9x. With cached input, DeepSeek V4 costs roughly 1/6th of the US frontier models. This is not a comparison between a frontier model and a budget model. DeepSeek V4 is a frontier model. It matches Opus 4.7 on reasoning benchmarks, trails by a small margin on creative writing, and leads on several code generation tasks. VentureBeat called it a direct challenge to US frontier pricing. The question is no longer whether cheaper models can handle serious workloads. They can. The question is which of your requests actually need the $25-per-million-output-tokens model. Why the output token gap matters more than the input gap Most cost discussions focus on input tokens. That made sense when prompts were long and responses were short. It does not hold for 2026 workloads. Agentic systems generate substantial output. A coding agent that writes files, explains changes, and produces tool calls generates 2 to 5x more output tokens than input tokens on many turns. A RAG system that synthesizes long answers from retrieved context follows the same pattern. Chain-of-thought reasoning, which Opus 4.7 and GPT-5.5 both default to for complex queries, inflates output token counts further. When output tokens dominate your bill, the 7x gap between DeepSeek V4 ($3.48/M) and Opus 4.7 ($25/M) is the number that matters. On a workload that generates 1 million output tokens per day, the difference is $21.52 per day, or $645 per month, on output alone. For teams with agentic workloads where output is 3x input, roughly 75% of the total token cost comes from output tokens. The model you choose for output-heavy requests is the single biggest lever on your bill. The new routing economics Intelligent routing has always saved money by sending low-complexity requests to cheaper models. The economics depended on two variables: the price spread between models and the percentage of requests that can safely go to the cheaper tier. Both variables just shifted in favor of routing. The price spread widened. Before DeepSeek V4, the practical spread between the cheapest capable model and the most expensive was roughly 5x (Haiku at $1/$5 vs. Opus at $5/$25). Now the spread between DeepSeek V4 and Opus 4.7 on output tokens is 7.2x. And DeepSeek V4 is not the floor. With volume discounts, DeepSeek V4-Pro drops to $0.435/$0.87, pushing the output spread to nearly 29x versus Opus 4.7. The capable-cheap tier got more capable. DeepSeek V4 handles tasks that previously required a true frontier model. Code review, multi-step reasoning, structured analysis. This means more of your traffic can safely route to the cheaper tier without quality loss. Here is what that looks like in practice. Assume a mixed workload of 100,000 requests per month, averaging 1,000 input tokens and 500 output tokens per request: | Routing strategy | Monthly cost | Savings vs. all-Opus | |------------------|-------------|----------------------| | All Opus 4.7 | $1,750 | baseline | | All DeepSeek V4 | $261 | 85% | | Routed: 60% DeepSeek V4, 30% Sonnet, 10% Opus | $466 | 73% | | Routed: 40% Haiku, 30% DeepSeek V4, 20% Sonnet, 10% Opus | $358 | 80% | The "all DeepSeek V4" row looks tempting. But quality matters. DeepSeek V4 trails Opus 4.7 on specific tasks: nuanced creative writing, complex multi-turn conversations with heavy context, and certain edge cases in code refactoring. The routed strategies preserve quality on hard requests while capturing most of the savings. This compounds in agentic workflows Single-request savings are meaningful. Agentic savings are transformative. Here is why. An agentic session with 30 turns re-sends the full context on every turn. Input tokens accumulate linearly, but each turn pays for all previous context. A session that starts at 2,000 input tokens per turn can reach 30,000 by turn 30. Total input tokens across 30 turns: roughly 480,000. Total output tokens: roughly 150,000, assuming 5,000 output tokens per turn. At Opus 4.7 pricing, that single session costs $6.15. At DeepSeek V4 pricing, it costs $1.36. Difference: $4.79 per session. A team running 500 agentic sessions per day saves $2,395 per day, or $71,850 per month, by routing appropriate sessions to DeepSeek V4 instead of Opus 4.7. Even conservative routing (40% of sessions shifted to the cheaper model) saves $28,740 per month. The compounding is why agentic workloads are the highest-leverage target for routing. Each incremental turn amplifies the cost difference between models. The quality tradeoff is not binary The temptation is to go all-in on DeepSeek V4 and pocket the 85% savings. Teams that do this will regret it within weeks. DeepSeek V4 performs within 2 to 3 percentage points of Opus 4.7 on aggregate benchmarks. But benchmarks are averages. Specific tasks show wider gaps: Complex multi-file code refactors. Opus 4.7 maintains coherence across files better than DeepSeek V4 on refactors touching 5+ files with interdependencies. Nuanced instruction following. Prompts with layered constraints (tone, format, audience, technical depth) see higher compliance rates on Opus 4.7. Long-context reasoning. At 100K+ token contexts, Opus 4.7 shows better recall and synthesis, particularly on contradictory information in the source material. Safety-critical outputs. Medical, legal, and financial content benefits from Opus 4.7's more conservative and thorough reasoning. The right strategy is not "cheapest model always" or "best model always." It is "cheapest model that can handle this specific request." That is what a trained classifier does. It reads the prompt, estimates the complexity, and routes accordingly. For the 60 to 70% of requests that are classifications, summaries, formatting, simple Q&A, and code completions, DeepSeek V4 or Haiku are more than sufficient. For the 10 to 15% that genuinely need frontier reasoning, Opus 4.7 is worth every token. What changed for multi-provider strategies Before May 2026, most routing strategies operated within a single provider's model family. Route between Haiku, Sonnet, and Opus. Or between GPT-4o-mini and GPT-5. The models share tokenizers, API formats, and behavioral patterns. It is clean. DeepSeek V4 breaks that pattern. The savings are too large to ignore, but adding a second provider introduces complexity: Tokenizer differences. DeepSeek V4 uses a different tokenizer than Anthropic or OpenAI. The same prompt produces a different token count on each provider. Cost estimates need to account for this, and token-based budgets need normalization. Behavioral differences. System prompt handling, tool call formatting, and response style differ between providers. A prompt tuned for Claude may need adjustment for DeepSeek, particularly around structured output formatting. Latency variance. DeepSeek V4 latency varies by region and load. Teams routing latency-sensitive requests need to factor time-to-first-token into routing decisions, not just cost. Availability. DeepSeek API availability has historically been less consistent than Anthropic or OpenAI. A routing strategy that depends on DeepSeek V4 needs a failover chain. These are solvable problems. An LLM gateway that handles provider normalization, token counting, and failover abstracts the complexity away from your application code. You get multi-provider savings without multi-provider headaches. The $690 billion backdrop This pricing war is happening against a staggering backdrop of AI infrastructure spending. Amazon, Alphabet, Microsoft, Meta, and Oracle are on track to spend over $600 billion on capex in 2026, the majority of it on AI infrastructure. Gartner projects worldwide AI spending at $2.5 trillion for the year. That spending creates pressure in both directions. Providers need to monetize their massive investments, which keeps frontier pricing high. At the same time, competitors like DeepSeek can undercut on price because their infrastructure costs are lower and they are willing to subsidize usage for market share. For buyers, this means the pricing spread will likely widen further before it narrows. New models from Chinese labs, European open-source projects, and specialized providers will continue to offer frontier-adjacent capabilities at a fraction of US frontier pricing. Routing becomes more valuable as the spread widens. A 3x price gap makes routing worth considering. A 7x gap makes it obvious. A 29x gap (DeepSeek V4-Pro volume pricing vs. Opus 4.7 output) makes it negligent not to route. Practical next steps If you are spending $1,000+ per month on LLM inference: Audit your prompt complexity distribution. What percentage of your requests are classifications, summaries, simple Q&A, or formatting? If it is above 50%, routing will cut your bill substantially. Benchmark DeepSeek V4 on your actual workload. Run your last 1,000 production prompts through V4 and compare outputs to your current model. Track quality on a per-category basis, not in aggregate. Set up per-request routing. A classifier evaluates each prompt and sends it to the cheapest model that can handle it. Simple requests go to Haiku or DeepSeek V4. Complex requests stay on Opus 4.7. Everything in between goes to Sonnet. Monitor with per-request cost headers. After routing, every request should tell you what it cost and what it saved. Without observability, you are guessing. Where Nadir fits Nadir's trained classifier evaluates each prompt in under 10 ms and routes to the cheapest model that can handle it. When the cheapest capable model is DeepSeek V4 instead of Sonnet, the savings per request jump from 2 to 3x to 7x on output tokens. The integration is two lines. Change the base URL, set model="auto". Nadir handles provider normalization, tokenizer differences, and failover. The x-nadir-cost-saved response header on every request shows the difference. The wider the price gap between models, the more routing saves. DeepSeek V4 just made that gap the widest it has ever been. Sources: VentureBeat, "DeepSeek V4 arrives with near state-of-the-art intelligence at 1/6th the cost of Opus 4.7" (May 2026). MindStudio, "DeepSeek V4 vs GPT-5.5 vs Claude Opus 4.7 Pricing Comparison" (May 2026). DataCamp, "Claude Opus 4.7 vs DeepSeek V4" (May 2026). Futurum, "AI Capex 2026: The $690B Infrastructure Sprint" (2026). Gartner, "Worldwide AI Spending Will Total $2.5 Trillion in 2026" (January 2026). Anthropic, OpenAI, DeepSeek model pricing as of May 2026.