Two years ago, 31% of FinOps teams managed AI spend. Today, 98% do. The FinOps Foundation's sixth annual State of FinOps report surveyed 1,192 respondents representing over $83 billion in annual cloud spend. The headline finding on AI is stark: in 2024, 31% of FinOps practitioners managed AI costs. In 2025, that number hit 63%. In 2026, 98% expect to manage AI spend within the year. FinOps for AI is now the top forward-looking priority across organizations of all sizes. This is not a gradual trend. It is a phase change. AI spend went from a niche concern to a universal one in 24 months. The question is no longer whether finance teams track AI costs. It is whether they can actually see them. Source: FinOps Foundation, "State of FinOps 2026 Report," 2026 Source: CloudKeeper, "State of FinOps 2026 Report: Key Trends, Insights, and What Comes Next" The #1 challenge: visibility into where the tokens go. The report identifies three interconnected challenges when extending FinOps to AI workloads. The top one is visibility. Practitioners cannot see AI costs at the level of detail they need. Cloud FinOps has years of tooling for tracking provisioned resources priced per hour or per GB. AI FinOps is different. A single workflow might hit an inference API, a vector database, a tool API, and a GPU cluster for fine-tuning. Each has its own pricing model, billing cycle, and unit of measure. Token-based LLM billing does not look like anything in the traditional cloud cost stack. The second challenge is allocation. AI usage is embedded inside product features, internal workflows, and agent chains that cross teams and systems. Attributing cost to a business unit requires tagging at the request level, not the service level. The third is ROI. Without per-request cost data, teams cannot calculate what a feature, customer, or transaction actually costs in inference spend. They know the total bill. They do not know which parts of the product are driving it. The top tooling request in the entire 2026 survey is granular monitoring of AI spend: tokens, LLM requests, and GPU utilization. Commercial tooling has not yet delivered this at scale. Source: Virtasant, "State of FinOps 2026 Signals Expansive Future for Practitioners" Source: USU, "6 Takeaways from the State of FinOps Report 2026" The gap between cloud FinOps and AI FinOps is structural. Cloud FinOps is a solved problem in 2026. Teams know how to track EC2 hours, S3 storage, and Lambda invocations. The billing data is granular, the tagging is standardized, and the tooling is mature. AI FinOps is at the stage cloud FinOps was in 2018. The billing data exists but it is aggregated at the wrong level. A monthly Anthropic invoice tells you total tokens consumed across all API keys. It does not tell you which feature consumed them, which user triggered them, or which requests could have gone to a cheaper model. The unit economics question is especially pointed for LLM inference. Cloud compute has relatively stable per-unit costs. LLM inference costs vary by model, by prompt length, by output length, by whether the input was cached, and by whether the request was batched. A single API call to Claude Opus 4.7 can cost 25x more than the same call to Haiku 4.5, depending on output length. Without per-request tracking, teams cannot tell which model handled which request or what it cost. This is why the FinOps Foundation updated its framework in 2026 to include AI-specific categories. The old framework assumed provisioned resources with predictable pricing. The new one accounts for consumption-based billing where cost per unit fluctuates with every request. Source: FinOps Foundation, "FinOps Framework 2026: Executive Strategy, Technology Categories, and Converging Disciplines" Organizations are being asked to self-fund AI through optimization savings. The report surfaces a dynamic that many engineering leads will recognize. Organizations are being told to fund AI investments by finding savings in their existing cloud footprint. The logic: optimize what you already spend, and redirect the freed-up budget to AI initiatives. This creates a direct feedback loop. The faster you reduce waste in your AI inference bill, the more budget you have for AI expansion. But you cannot reduce waste you cannot see. Without per-request cost visibility, optimization is guesswork. You can negotiate volume discounts with your provider. You can set hard budget caps. But you cannot identify which requests are overpaying for model capability they do not need. This is the difference between FinOps (governance and visibility) and cost optimization (routing and model selection). FinOps tells you how much you spent. Optimization decides how much you should have spent. The State of FinOps report covers the first. The second requires a routing layer. What per-request visibility actually looks like. The tooling gap the report identifies has a specific shape. Here is what teams need and what most do not have: 1. Per-request model attribution. Which model handled each API call? If your application sends model="auto" or uses a gateway, did the request go to Opus, Sonnet, or Haiku? Without this, you cannot calculate cost per request accurately. Different models have 5x to 25x price differences on the same prompt. 2. Per-request cost calculation. Input tokens, output tokens, cached tokens, and the model-specific rate for each. This needs to happen at the request level, not aggregated by day or by API key. A daily aggregate hides the distribution. Ten cheap requests and one expensive one average out to a medium cost that describes none of them. 3. Per-request savings attribution. If you are routing, how much did each routing decision save compared to the default (most expensive) model? This is the number that justifies the routing layer. Without it, you are trusting the vendor's benchmark instead of measuring your own workload. 4. Metadata tagging at the request level. User ID, feature name, environment, team. This is what makes allocation possible. The FinOps Foundation's report says allocation is the second-hardest challenge. It is hard because the data is not tagged at the right granularity. 5. Real-time access, not batch reports. Monthly invoices and weekly dashboards are not fast enough to catch anomalies. An agent loop that burns $30 on a $0.50 task needs to be visible within minutes, not at the end of the billing cycle. The FinOps tooling vendors (Vantage, Amnic, Finout) are building toward this. Vantage launched LLM Token Allocation in private preview in 2026, joining token observability data to cost rows with per-model, per-team, and per-customer allocation. But most of these tools sit outside the request path. They ingest billing data after the fact. Source: Vantage, "AI Cost Observability: Measuring and Justifying Token Spend in 2026" The routing layer is the observability layer. There is a simpler path to per-request visibility. If every API call goes through a routing proxy, the proxy can log per-request cost data as a side effect of routing. This is not a novel insight. It is how cloud cost management evolved. AWS did not ship cost allocation tags on day one. Third-party tools and API gateways added the metadata that made FinOps possible. The same pattern is playing out for AI inference, but faster, because the billing model (per-token, variable by model) makes visibility even more critical. A routing layer that sits between your application and the LLM provider sees every request. It knows the model, the token counts, the cost, and the routing decision. It can tag each request with application metadata (user, feature, environment) and expose per-request cost data in real time. The FinOps Foundation's report says the #1 missing feature is granular monitoring of AI spend. A routing proxy that logs per-request cost data delivers exactly that, as a byproduct of making the routing decision. The numbers on what visibility enables. Enterprise data from the AICC's analysis of 2.4 billion API calls shows the impact of combining visibility with routing. Organizations that implemented intelligent multi-model routing achieved median blended costs of $2.31 per million tokens, compared to $18.40 for frontier-only deployments. That is an 87% reduction. But the reduction did not come from routing alone. It came from routing informed by visibility. Teams that could see per-request cost data identified which request types were overpaying for model capability. They set routing thresholds based on observed quality, not assumptions. They measured the actual savings per routing decision and adjusted. The FinOps Foundation's earlier survey of 1,192 organizations found that teams with financial guardrails for AI spend 3.2x less per completed task than teams without. The guardrails are only as good as the data feeding them. IDC predicts that by 2028, 70% of top AI-driven enterprises will use dynamic model routing. The prediction implicitly assumes per-request cost visibility as a prerequisite. You cannot route dynamically if you cannot see what each route costs. Source: AICC, "Enterprise Token Costs Drop 67% Year-Over-Year as Multi-Model AI Adoption Hits Record High," May 2026 Source: IDC Blog, "Why the Future of AI Lies in Model Routing," November 2025 Three things engineering teams should do now. 1. Instrument per-request token costs today. Do not wait for your FinOps tool to ship an AI module. Log input tokens, output tokens, model, and cost for every LLM API call. A week of data is enough to identify your top cost drivers. Most teams discover that 50 to 70% of their token spend goes to requests that did not need a frontier model. 2. Tag requests with business metadata. User ID, feature name, team, environment. Without tags, you can see total cost but not cost per feature or cost per customer. The allocation problem the FinOps report identifies is a tagging problem. Solve it at the request level and the aggregation follows. 3. Route based on what you see. Once you have per-request cost data, the routing decisions become obvious. Classification requests going to Opus at $5/$25 per million tokens should go to Haiku at $1/$5. Formatting and summarization do not need frontier reasoning. A trained classifier can make these decisions in under 10 ms per request, but the visibility comes first. Where Nadir fits. Nadir is a routing proxy that also solves the visibility problem the FinOps Foundation identified. Every request through Nadir returns per-request response headers: x-nadir-routed-to (which model handled the request), x-nadir-cost-usd (what it cost), x-nadir-cost-saved (what was saved versus the default model), and x-nadir-latency-ms (routing overhead). The dashboard aggregates these by day, week, month, feature, and API key. This is the granular monitoring that 98% of FinOps teams say they need. It ships as a side effect of the two-line integration: change the base URL, set model="auto". No separate observability pipeline. No CSV uploads. No private preview waitlist. For teams that need to show finance where the AI budget is going, per-request cost headers are the answer. For teams that need to reduce that budget, routing is the answer. They are the same integration. Sources: FinOps Foundation, "State of FinOps 2026 Report" (2026). CloudKeeper, "State of FinOps 2026 Report: Key Trends, Insights, and What Comes Next". Virtasant, "State of FinOps 2026 Signals Expansive Future for Practitioners". USU, "6 Takeaways from the State of FinOps Report 2026". FinOps Foundation, "FinOps Framework 2026". Vantage, "AI Cost Observability: Measuring and Justifying Token Spend in 2026". AICC, "Enterprise Token Costs Drop 67%" (May 2026). IDC Blog, "Why the Future of AI Lies in Model Routing" (November 2025).