Do I need to change my code?

No. Nadir exposes an OpenAI compatible API. Change your base URL to api.getnadir.com and set model to auto. That is the entire change.

How does routing decide?

A trained pre-classifier reads each prompt in under 10 ms. Confident routes ship straight from a Haiku-class model. Borderline routes get the cheap-model answer scored by a calibrated verifier (AUROC 0.961 on RouterBench held-out). If the verifier accepts, ship cheap; if it rejects, escalate to Sonnet or Opus. The router never ships an answer it has not verified.

On 11,420 RouterBench held-out triples, Nadir's verifier-gated cascade preserves 98% of always-Opus quality at 60% lower cost. Catastrophic-route rate is 1.7%. The verifier reads every borderline cheap-model answer before it ships, so quality drops are caught rather than absorbed silently. You can also set a per-API-key quality floor that pins traffic above your threshold to your configured premium model.

Do you store my prompts?

Only if you turn on logging. With BYOK and logging off, we never see your plaintext. Just headers and token counts.

Can I bring my own keys?

Yes. BYOK is supported on every tier, including Free. Your keys stay in your environment.

What if a provider is down?

Automatic failover. If Anthropic errors, Nadir retries against OpenAI or Google on your configured chain. Your app stays up.

Nadir is a verifier-gated cascade LLM router. It sits between your application and LLM providers (Anthropic, OpenAI, Google). The cheap model answers first, a calibrated verifier scores the answer before we ship it, and we escalate to a stronger model only when the verifier rejects. On 11,420 RouterBench held-out triples, Nadir cuts cost 60% versus always-Opus while preserving 98% of always-Opus quality. OpenAI compatible, two-line change.

What is a verifier-gated cascade?

A verifier-gated cascade is a routing architecture where the cheap model answers the prompt first, then a calibrated verifier model scores that answer for quality. If the verifier accepts, the cheap answer ships. If the verifier rejects, the request escalates to a stronger model (Sonnet, then Opus). The wedge against one-shot prompt-only routers (Not Diamond, Martian) is that quality drops are caught and surfaced, not absorbed. Nadir's verifier has AUROC 0.961 and ECE 0.016 on RouterBench held-out (n=11,420).

How is Nadir different from Not Diamond, Martian, OpenRouter, or Portkey?

Not Diamond and Martian route once with a meta-classifier from the prompt alone and ship whatever the chosen model produces. If the prediction is wrong, the user eats the bad answer. Nadir reads the cheap-model answer with a calibrated verifier (AUROC 0.961) before shipping, so routing mistakes are recoverable rather than absorbed. OpenRouter and Portkey hand you a catalogue and a fallback; you still pick the tier. Nadir picks the tier and verifies the outcome. On RouterBench held-out, Nadir preserves 98% of always-Opus quality at 60% lower cost.

Yes. Nadir is free and open-source under the MIT license. A hosted plan is also available with no base fee: you pay a variable savings fee only when Nadir cuts your bill (25% of the first $2,000 of monthly savings, 10% above), so if Nadir saves you nothing, you pay nothing.

What LLM providers does Nadir support?

Nadir supports Anthropic (Claude), OpenAI (GPT), and Google (Gemini) models out of the box. You can configure custom model tiers and routing rules for any OpenAI-compatible provider.

What are the benchmark results?

On 11,420 held-out RouterBench triples (eval JSON: verifier/reports/eval_composed_20260526T191001.json), Nadir's verifier-gated cascade preserves 98% of always-Opus quality at 60% lower cost. Catastrophic-route rate is 1.7%. Verifier AUROC is 0.961 and calibration ECE is 0.016 on the same held-out split. Pre-classifier overhead is under 10 ms; verifier latency is 180 ms on CPU INT8 when it runs, and most requests skip the verifier entirely because the pre-classifier is confident. On RouterArena's public scorer, Nadir scored arena_score 72.3, ranking #4 of 21 routers on the public leaderboard. A contamination audit confirmed zero prompt overlap between Nadir training corpora and the RouterArena eval splits. Savings vary with your prompt mix; the eval methodology and full threshold sweep are reproducible from the open-source eval harness.

What is the verifier's AUROC and calibration?

Verifier AUROC is 0.961 and Expected Calibration Error (ECE) is 0.016, measured on 11,420 RouterBench held-out triples (verifier/reports/eval_20260526T184516.json). AUROC measures the verifier's ability to discriminate good cheap-model answers from bad ones; ECE measures whether the verifier's confidence scores are well calibrated to actual accept rates. Both are reported on a held-out split disjoint from training (overlap_count=0).

A UC Berkeley paper showed 74% of GPT-4 calls don't need GPT-4. Anyscale deployed it in production and cut costs 85%. Here is the implementation guide.