NadirClaw Documentation

Open-source LLM router that classifies prompt complexity and routes to the optimal model. Cut AI API costs by 40–70% with zero code changes.

Installation

pip (recommended)

pip install nadirclaw

One-line install script

curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | sh

This clones to ~/.nadirclaw, creates a venv, installs deps, and adds nadirclaw to your PATH. Run it again to update.

From source

git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
python3 -m venv venv
source venv/bin/activate
pip install -e .

Docker

git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up

Uninstall

rm -rf ~/.nadirclaw
sudo rm -f /usr/local/bin/nadirclaw

Quick Start

# Install
pip install nadirclaw

# Interactive setup (providers, API keys, models)
nadirclaw setup

# Start the router
nadirclaw serve --verbose

NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini Flash for simple, Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.

First Run

On first request, NadirClaw downloads the all-MiniLM-L6-v2 sentence embedding model (~80 MB). This takes 2–3 seconds. Subsequent requests classify in ~10ms.

Prerequisites: Python 3.10+ and at least one LLM provider — a Gemini API key (free tier available), Ollama running locally (free), or any cloud provider API key.

Environment Variables

NadirClaw loads config from ~/.nadirclaw/.env. If that doesn't exist, it falls back to .env in the current directory.

VariableDefaultDescription
NADIRCLAW_SIMPLE_MODELgemini-3-flash-previewModel for simple prompts
NADIRCLAW_COMPLEX_MODELopenai-codex/gpt-5.3-codexModel for complex prompts
NADIRCLAW_REASONING_MODELfalls back to complexModel for reasoning tasks
NADIRCLAW_FREE_MODELfalls back to simpleFree/local fallback model
NADIRCLAW_FALLBACK_CHAINall tier modelsComma-separated cascade on failure
NADIRCLAW_CONFIDENCE_THRESHOLD0.06Classification threshold (lower = more complex)
NADIRCLAW_PORT8856Server port
NADIRCLAW_AUTH_TOKENempty (auth disabled)Bearer token requirement
NADIRCLAW_LOG_DIR~/.nadirclaw/logsLog directory
NADIRCLAW_LOG_RAWfalseLog full raw requests/responses
NADIRCLAW_DAILY_BUDGETnoneDaily spend limit in USD
NADIRCLAW_MONTHLY_BUDGETnoneMonthly spend limit in USD
NADIRCLAW_BUDGET_WARN_THRESHOLD0.8Alert at this fraction of budget
NADIRCLAW_BUDGET_WEBHOOK_URLnoneWebhook for budget alerts
NADIRCLAW_BUDGET_STDOUT_ALERTSfalsePrint budget alerts to stdout
NADIRCLAW_CACHE_TTLdefaultCache time-to-live in seconds
NADIRCLAW_CACHE_MAX_SIZEdefaultMax cache entries
GEMINI_API_KEYGoogle Gemini API key
ANTHROPIC_API_KEYAnthropic API key
OPENAI_API_KEYOpenAI API key
OLLAMA_API_BASEhttp://localhost:11434Ollama base URL
OTEL_EXPORTER_OTLP_ENDPOINTemptyOpenTelemetry collector endpoint

Config File

The primary configuration lives in ~/.nadirclaw/.env. Example:

# ~/.nadirclaw/.env

# API keys
GEMINI_API_KEY=AIza...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Model routing
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro

# Server
NADIRCLAW_PORT=8856

# Budget
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00

Credentials are stored separately in ~/.nadirclaw/credentials.json (managed via nadirclaw auth). Logs go to ~/.nadirclaw/logs/.

Model Setup

Configure which model handles each routing tier:

SetupSimple ModelComplex ModelKeys Needed
Gemini + Geminigemini-2.5-flashgemini-2.5-proGEMINI_API_KEY
Gemini + Claudegemini-2.5-flashclaude-sonnet-4-5-20250929GEMINI_API_KEY + ANTHROPIC_API_KEY
Claude + Claudeclaude-haiku-4-5-20251001claude-sonnet-4-5-20250929ANTHROPIC_API_KEY
OpenAI + OpenAIgpt-4.1-minigpt-4.1OPENAI_API_KEY
Fully localollama/llama3.1:8bollama/qwen3:32bNone

Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM (100+ providers).

Model Aliases

Use short names instead of full model IDs:

AliasResolves To
sonnetclaude-sonnet-4-5-20250929
opusclaude-opus-4-6-20250918
haikuclaude-haiku-4-5-20251001
gpt4gpt-4.1
flashgemini-2.5-flash
deepseekdeepseek/deepseek-chat
llamaollama/llama3.1:8b

Authentication

NadirClaw checks credentials in order: OpenClaw stored token → NadirClaw credential → environment variable.

# Add API keys
nadirclaw auth add --provider google --key AIza...
nadirclaw auth add --provider anthropic --key sk-ant-...
nadirclaw auth add --provider openai --key sk-...

# OAuth login (no API key needed)
nadirclaw auth openai login
nadirclaw auth anthropic login
nadirclaw auth gemini login

# Store Claude subscription token
nadirclaw auth setup-token

# Check status
nadirclaw auth status

# Remove
nadirclaw auth remove google

Routing

How Classification Works

NadirClaw uses a binary complexity classifier based on sentence embeddings:

  1. Pre-computed centroids — two tiny vectors (~1.5 KB each) derived from ~170 seed prompts, shipped with the package.
  2. Classification — computes the prompt's embedding via all-MiniLM-L6-v2 and measures cosine similarity to both centroids. Closer to complex centroid → complex model.
  3. Borderline handling — when confidence is below threshold (default 0.06), defaults to complex. It's cheaper to over-serve than under-serve.

Routing Tiers

TierWhen UsedTypical Prompts
SimplePrompt closer to simple centroid with confidence above threshold"What does this function do?", "Format this JSON", "Add a docstring"
ComplexPrompt closer to complex centroid, or borderline"Refactor this module", "Design a caching layer", "Debug this deadlock"
Reasoning2+ reasoning markers detected ("step by step", "prove that", "analyze tradeoffs")Mathematical proofs, architecture analysis, critical evaluations
AgenticTool definitions, tool-role messages, agent system prompts, deep conversations (>10 messages)Any multi-step agent workflow, coding agent sessions

Routing Modifiers

After base classification, these overrides apply in order:

Confidence Threshold

The NADIRCLAW_CONFIDENCE_THRESHOLD (default 0.06) controls borderline routing. Lower values route more prompts to complex. Adjust based on your quality tolerance:

# More conservative (routes more to complex)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.03

# More aggressive savings (routes more to simple)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.10

Routing Profiles

Override routing strategy per-request via the model field:

ProfileModel FieldBehavior
autoauto or omitSmart routing (default)
ecoecoAlways use simple model
premiumpremiumAlways use complex model
freefreeUse free/local fallback model
reasoningreasoningUse reasoning model
# Use eco mode for maximum savings
curl http://localhost:8856/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "eco", "messages": [{"role": "user", "content": "Hello"}]}'

Fallback Chains

When a model fails (429 rate limit, 5xx error, or timeout), NadirClaw cascades through a configurable chain of fallback models until one succeeds.

# Configure fallback order
NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash

Default behavior: If no fallback chain is configured, NadirClaw uses all your configured tier models. For example, if the simple model hits a 429, it retries once, then tries the complex model.

Rate limit handling: On 429 errors, NadirClaw automatically retries once before moving to the next model in the chain. If all models are exhausted, it returns a friendly error message.

Prompt Caching

NadirClaw includes an in-memory LRU cache for identical chat completions, skipping redundant LLM calls entirely.

# Configure cache
NADIRCLAW_CACHE_TTL=300          # TTL in seconds (default varies)
NADIRCLAW_CACHE_MAX_SIZE=1000    # Max cached entries

Monitor cache:

# CLI
nadirclaw cache

# API endpoint
curl http://localhost:8856/v1/cache

Cache is keyed on the full message content. Streaming requests with identical content will also hit the cache. Only exact matches count — no fuzzy matching.

Claude Code

NadirClaw works as a drop-in proxy for Claude Code:

# Point Claude Code at NadirClaw
export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local

# Start NadirClaw, then use Claude Code normally
nadirclaw serve --verbose
claude

Or use a shell alias:

alias claude-routed='ANTHROPIC_BASE_URL=http://localhost:8856/v1 ANTHROPIC_API_KEY=local claude'

Simple prompts ("read this file", "what does this function do?") route to a cheap model like Gemini Flash. Complex prompts ("refactor this module") stay on Claude. Typical savings: 40–70%.

Using your Claude subscription

# OAuth login (opens browser)
nadirclaw auth anthropic login

# Or store token directly
nadirclaw auth setup-token

OpenClaw

# Auto-configure OpenClaw to use NadirClaw
nadirclaw openclaw onboard

# Start the router
nadirclaw serve

This writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. OpenClaw auto-reloads — no restart needed.

The generated config:

{
  "models": {
    "providers": {
      "nadirclaw": {
        "baseUrl": "http://localhost:8856/v1",
        "apiKey": "local",
        "api": "openai-completions",
        "models": [{ "id": "auto", "name": "auto" }]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "nadirclaw/auto" }
    }
  }
}

Codex

# Auto-configure Codex
nadirclaw codex onboard

# Start the router
nadirclaw serve

This writes ~/.codex/config.toml:

model_provider = "nadirclaw"

[model_providers.nadirclaw]
base_url = "http://localhost:8856/v1"
api_key = "local"

OpenAI OAuth

# Use ChatGPT subscription instead of API key
nadirclaw auth openai login

Any OpenAI-Compatible Client

NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:

# Base URL: http://localhost:8856/v1
# Model: "auto" (or omit)
# API Key: "local" (or anything — auth disabled by default)

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8856/v1",
    api_key="local",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.choices[0].message.content)

curl

curl http://localhost:8856/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "stream": true
  }'

Works with Continue, aider, Cursor, or any tool that speaks the OpenAI chat completions API. Just set the base URL to http://localhost:8856/v1.

CLI Reference

nadirclaw serve

Start the router server.

nadirclaw serve [OPTIONS]

Options:
  --port INTEGER          Port (default: 8856)
  --simple-model TEXT     Model for simple prompts
  --complex-model TEXT    Model for complex prompts
  --token TEXT            Auth token
  --verbose               Debug logging
  --log-raw               Log full raw requests/responses to JSONL

nadirclaw setup

Interactive setup wizard — guides you through providers, API keys, and model selection.

nadirclaw setup

nadirclaw classify

Classify a prompt locally without running the server:

$ nadirclaw classify "What is 2+2?"
Tier:       simple
Confidence: 0.2848
Score:      0.0000
Model:      gemini-3-flash-preview

$ nadirclaw classify "Design a distributed system for real-time trading"
Tier:       complex
Confidence: 0.1843
Score:      1.0000
Model:      gemini-2.5-pro

nadirclaw report

Analyze request logs:

nadirclaw report                     # full report
nadirclaw report --since 24h         # last 24 hours
nadirclaw report --since 7d          # last 7 days
nadirclaw report --model gemini      # filter by model
nadirclaw report --format json       # machine-readable JSON
nadirclaw report --export report.txt # save to file

nadirclaw savings

Show cost savings with monthly projections:

nadirclaw savings
nadirclaw savings --since 7d

nadirclaw dashboard

Live terminal dashboard with real-time stats. Also available as web UI at http://localhost:8856/dashboard.

pip install nadirclaw[dashboard]
nadirclaw dashboard

nadirclaw status

Show current config, credentials, and server status:

$ nadirclaw status
NadirClaw Status
----------------------------------------
Simple model:  gemini-3-flash-preview
Complex model: gemini-2.5-pro
Port:          8856
Threshold:     0.06
Server:        RUNNING (ok)

Other Commands

nadirclaw auth add/status/remove    # Manage credentials
nadirclaw auth openai login         # OAuth login
nadirclaw codex onboard             # Configure Codex integration
nadirclaw openclaw onboard          # Configure OpenClaw integration
nadirclaw ollama discover           # Auto-discover Ollama instances
nadirclaw cache                     # View cache stats
nadirclaw build-centroids           # Regenerate centroid vectors

Budget & Cost Tracking

NadirClaw tracks per-request costs in real time and supports budget limits with alerts.

Setting Budgets

# In ~/.nadirclaw/.env
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00
NADIRCLAW_BUDGET_WARN_THRESHOLD=0.8  # Alert at 80% of budget

Alerts

When spend crosses the warning threshold, NadirClaw can:

Reporting

# See savings
nadirclaw savings

# Detailed report with cost breakdown
nadirclaw report --since 7d

# Live monitoring
nadirclaw dashboard

Reports include: total requests, tier distribution, per-model usage and tokens, latency percentiles (p50/p95), fallback counts, and error rates.

Docker

NadirClaw + Ollama (fully local, zero cost)

git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up

This starts Ollama and NadirClaw on port 8856. Pull a model:

docker compose exec ollama ollama pull llama3.1:8b

With cloud providers

Create a .env file with API keys and model config (see .env.example), then restart:

# .env
GEMINI_API_KEY=AIza...
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro

Standalone (no Ollama)

docker build -t nadirclaw .
docker run -p 8856:8856 --env-file .env nadirclaw

API Reference

Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.

EndpointMethodDescription
/v1/chat/completionsPOSTOpenAI-compatible completions with auto routing (supports stream: true)
/v1/classifyPOSTClassify a prompt without calling an LLM
/v1/classify/batchPOSTClassify multiple prompts at once
/v1/modelsGETList available models
/v1/logsGETView recent request logs
/v1/cacheGETCache stats
/healthGETHealth check (no auth)
/dashboardGETWeb dashboard UI

Chat Completions Request

POST /v1/chat/completions

{
  "model": "auto",           // or "eco", "premium", "free", "reasoning", or a model alias
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
  ],
  "stream": true,            // optional, SSE streaming
  "temperature": 0.7,        // optional, passed through to provider
  "tools": [...]             // optional, triggers agentic detection
}

Classify Request

POST /v1/classify

{
  "messages": [
    {"role": "user", "content": "What is 2+2?"}
  ]
}

// Response:
{
  "tier": "simple",
  "confidence": 0.2848,
  "score": 0.0,
  "model": "gemini-3-flash-preview"
}

Troubleshooting

First request is slow (2–3 seconds)

Normal — NadirClaw downloads the sentence embedding model (~80 MB) on first use. Subsequent requests classify in ~10ms.

Port 8856 already in use

# Use a different port
nadirclaw serve --port 9000

# Or set in env
NADIRCLAW_PORT=9000

Model returning errors

Ollama not found

# Auto-discover Ollama instances
nadirclaw ollama discover

# Or set manually
OLLAMA_API_BASE=http://192.168.1.100:11434 nadirclaw serve

Too many prompts routed to complex

Raise the confidence threshold to route more to simple:

NADIRCLAW_CONFIDENCE_THRESHOLD=0.10

Too many prompts routed to simple (quality issues)

Lower the confidence threshold:

NADIRCLAW_CONFIDENCE_THRESHOLD=0.03

Streaming not working

NadirClaw supports full SSE streaming. Ensure your request includes "stream": true and your client handles SSE format. Check that you're not behind a reverse proxy that buffers responses.

Rate limits (429 errors)

NadirClaw handles these automatically — retries once, then falls through the fallback chain. If all models are exhausted, configure additional fallbacks:

NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash

Viewing logs

# Request logs
ls ~/.nadirclaw/logs/

# Full raw logging (for debugging)
nadirclaw serve --log-raw

# Analyze logs
nadirclaw report --since 24h

Need help? Open an issue on GitHub or check the README for the latest info.