The number that should change how you think about your AI bill. OpenAI projects $14 billion in losses for 2026. Not revenue. Not spending. Losses. The company expects to generate roughly $13 billion in revenue this year while spending $20 to $21 billion. That gap is not a bug in their business model. It is the business model. They are pricing inference below cost to capture market share, and your API bill is the subsidy. Source: The Information, "OpenAI Projections Imply Losses Tripling to $14 Billion in 2026," 2026 Source: Yahoo Finance, "OpenAI's own forecast predicts $14 billion loss in 2026," 2026 This is not a short-term dip. OpenAI's internal projections show $44 billion in cumulative losses through 2028 before the company expects to turn a profit sometime in 2029. HSBC analysts concluded that OpenAI likely will not make money by 2030 and still faces a $207 billion funding shortfall to power its growth plans. Source: R&D World, "Facing $14B losses in 2026, OpenAI is now seeking $100B in funding," 2026 Source: Windows Central, "OpenAI could lose $14 billion in 2026, becoming bankrupt by 2027," 2026 Only 5% of ChatGPT's 800 million users pay. Sam Altman publicly admitted OpenAI loses money on $200-per-month ChatGPT Pro subscriptions. When they shut down Sora in early 2026, the platform was reportedly burning $15 million per day in inference costs against $2.1 million in lifetime revenue. Source: MindStudio, "Inference Costs Are the New AI Wall: What Sora's Shutdown Tells Us About the Industry," 2026 The prices you pay today are not market rates. They are customer acquisition costs. Every major provider is running the same play. This is not an OpenAI problem. It is an industry-wide pricing strategy. Google slashed Gemini 3.5 Flash pricing at I/O 2026 to $1.50 per million input tokens and $9.00 per million output, undercutting its own Gemini 3.1 Pro by 25%. Investors immediately flagged the risk: aggressive discounting compresses Google Cloud margins. Source: Seeking Alpha, "Google introduces new pricing tiers for Gemini based on inference usage," 2026 Source: The National, "Google lowers Gemini pricing and says AI can save companies $1bn a year," 2026 Meta gives away Llama inference for free on its platforms and subsidizes open-weight hosting costs to commoditize the layer that its competitors monetize. The xAI federal contract signed in June 2026 gives all US government agencies access to Grok 4 for $0.42 per agency for 18 months. That is a rounding error, not a price. Anthropic is the one partial exception. The company filed its S-1 confidentially with the SEC on June 1, 2026, at a $965 billion valuation, and projects its first profitable quarter in Q2 2026 with $559 million in operating profit on $10.9 billion in revenue. But Anthropic's path to profitability is driven by high revenue per token from enterprise API customers, not by discounting. Its $47 billion annualized run rate comes largely from companies paying full API rates. Source: CNBC, "Anthropic confidentially files IPO prospectus with SEC," June 2026 Source: BuildMVPFast, "Anthropic S-1 Filing 2026: $965B IPO Analysis," 2026 The point is the same regardless of which provider you use: the current price level is structurally unstable. Providers are either losing money or extracting high margins from enterprise customers who do not optimize. Neither equilibrium is permanent. What happens when subsidies end. Industry analysts are not subtle about the direction. Arcade.dev's analysis of inference economics concluded that API prices are likely to increase for frontier models within 12 to 24 months as the subsidized pricing race winds down and capital discipline returns. Some analysts project increases of 3 to 10x to reach sustainable unit economics. Source: Arcade.dev, "Why AI Inference Is Underpriced for Enterprise AI," 2026 Source: MindStudio, "The Free Sample Phase: Why AI Tools Are Underpriced and What Comes Next," 2026 Source: UpTech Studio, "The True Cost of AI: When the Subsidies Run Out," 2026 The correction does not have to be dramatic to be painful. A 2x increase in frontier API pricing would double the bill of every team running always-Opus or always-GPT-5.5 workflows. For enterprise teams spending $50,000 to $100,000 per month on inference, that is a $600,000 to $1.2 million annual increase. Meanwhile, the signals are already arriving. GitHub moved Copilot to metered billing on June 1. Anthropic meters Claude Code agent usage starting June 15. OpenAI shifted Codex to per-token pricing in April. Three platforms moved from flat-rate to usage-based pricing in 30 days. The free sample phase is ending. The pattern is consistent: as agentic workloads consume 50x more tokens than chat-era usage, every provider is moving toward making the per-token cost visible. The next step is making the per-token cost sustainable. The correction math for a typical team. Consider a team spending $30,000 per month on LLM inference today, running everything through a frontier model at $5/$25 per million tokens (input/output). | Scenario | Monthly bill | Annual cost | |---|---:|---:| | Current subsidized rate | $30,000 | $360,000 | | 2x price correction | $60,000 | $720,000 | | 3x price correction | $90,000 | $1,080,000 | | Current rate + routing (60% savings) | $12,000 | $144,000 | | 3x correction + routing (60% savings) | $36,000 | $432,000 | The last row is the critical one. A team that routes today at subsidized prices pays $12,000 per month. The same team with routing survives a 3x price correction at $36,000, still above today's unrouted bill. A team without routing at 3x pays $90,000. Routing does not just save money at current prices. It compresses the variance of future price scenarios. The worst case with routing is better than the base case without it. The optimization surface is wider than you think. The savings from routing come from a simple observation: most API calls do not need a frontier model. Datadog's State of AI Engineering 2026 report measured production telemetry across thousands of companies and found that 69% of all input tokens are system prompts, tool schemas, and policy definitions that repeat on every call. Source: Datadog, "State of AI Engineering 2026" The AICC analyzed 2.4 billion enterprise API calls and found that organizations with intelligent multi-model routing achieved median blended costs of $2.31 per million tokens versus $18.40 for organizations without routing. That is an 87% difference at today's subsidized prices. Source: AICC, "Enterprise Token Costs Drop 67% Year-Over-Year," May 2026 The tier spread between models is not shrinking. Claude Haiku 4.5 costs $1/$5 per million tokens. Claude Opus 4.8 costs $5/$25. GPT-5.4 costs $1.25/$10. DeepSeek V4 costs $1.74/$3.48. The output token gap between the cheapest capable model and the most expensive frontier model is 7x or more. That gap is structural. It exists because smaller models genuinely cost less to run, not because of subsidies. When prices normalize upward, the absolute savings from routing grow proportionally. A 60% reduction on a $5/$25 rate card saves $15 per million output tokens. A 60% reduction on a $15/$75 rate card saves $45 per million output tokens. Routing gets more valuable as prices rise, not less. Model-agnostic architecture is the hedge. The safest enterprise strategy is to build model-agnostic workflows today so that switching providers or moving to local inference is an operational decision rather than a re-engineering project. Source: Arcade.dev, "Why AI Inference Is Underpriced for Enterprise AI," 2026 This is what a routing layer provides. Instead of hardcoding a single provider and model into every call site, you route through a classification layer that matches each request to the cheapest model that can handle it. When prices change, you adjust the routing thresholds. When new models launch, you add them to the pool. When a provider has an outage, you fail over to the next tier. The FinOps Foundation surveyed 1,192 organizations managing $83 billion in cloud spend and found that 98% of FinOps teams now manage AI costs, up from 31% two years ago. But their top challenge remains the same: they cannot see token-level costs per request, per feature, or per user. A routing layer with per-request analytics closes that gap. Source: FinOps Foundation, "State of FinOps 2026" IDC predicts 70% of top AI enterprises will use dynamic model routing by 2028. The VC market agrees: over $250 million flowed into the routing layer in a single month in early 2026, with OpenRouter raising $113 million at a $1.3 billion valuation and Palo Alto Networks acquiring Portkey for roughly $130 million. Source: IDC, "The Future of AI Is Model Routing," 2026 What to do before prices move. The correction timeline is uncertain. It could be 6 months, 12 months, or 24 months. But the direction is not uncertain. Providers cannot lose $14 billion a year indefinitely. Capital markets will demand profitability, and the primary lever is pricing. Three steps that take less than a day and reduce your exposure: 1. Audit your model usage. Pull your API logs and count how many requests go to frontier models versus cheaper alternatives. Most teams find that 60 to 80% of their calls are simple classification, formatting, file reads, or boilerplate that a $1/M token model handles identically to a $5/M token model. 2. Add a routing layer. Route simple requests to cheap models, mid-complexity to Sonnet-class models, and only complex reasoning tasks to frontier models. The blended cost drops immediately, and you gain the architectural flexibility to adjust when prices change. 3. Set up per-request cost tracking. If you cannot see cost per request, you cannot optimize. Tag each request with the model used, tokens consumed, and estimated cost. When a price change lands, you will know the impact in minutes instead of waiting for the monthly invoice. The teams that treat current API prices as permanent are building on a subsidy. The teams that treat routing as infrastructure are building a hedge. When prices correct, the first group scrambles. The second group adjusts a config. Sources: The Information, "OpenAI Projections Imply Losses Tripling to $14 Billion in 2026". Yahoo Finance, "OpenAI's own forecast predicts $14 billion loss in 2026". R&D World, "Facing $14B losses in 2026". Windows Central, "OpenAI could lose $14 billion in 2026". MindStudio, "Inference Costs Are the New AI Wall". Seeking Alpha, "Google introduces new pricing tiers for Gemini". The National, "Google lowers Gemini pricing". CNBC, "Anthropic confidentially files IPO prospectus". Arcade.dev, "Why AI Inference Is Underpriced". MindStudio, "The Free Sample Phase". UpTech Studio, "The True Cost of AI". Datadog, "State of AI Engineering 2026". AICC, "Enterprise Token Costs Drop 67%". FinOps Foundation, "State of FinOps 2026". IDC, "The Future of AI Is Model Routing". Anthropic, OpenAI, Google model pricing as of June 2026.