NadirClaw Documentation
Open-source LLM router that classifies prompt complexity and routes to the optimal model. Cut AI API costs by 40–70% with zero code changes.
Installation
pip (recommended)
pip install nadirclaw
One-line install script
curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | sh
This clones to ~/.nadirclaw, creates a venv, installs deps, and adds nadirclaw to your PATH. Run it again to update.
From source
git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
python3 -m venv venv
source venv/bin/activate
pip install -e .
Docker
git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up
Uninstall
rm -rf ~/.nadirclaw
sudo rm -f /usr/local/bin/nadirclaw
Quick Start
# Install
pip install nadirclaw
# Interactive setup (providers, API keys, models)
nadirclaw setup
# Start the router
nadirclaw serve --verbose
NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini Flash for simple, Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.
First Run
On first request, NadirClaw downloads the all-MiniLM-L6-v2 sentence embedding model (~80 MB). This takes 2–3 seconds. Subsequent requests classify in ~10ms.
Prerequisites: Python 3.10+ and at least one LLM provider — a Gemini API key (free tier available), Ollama running locally (free), or any cloud provider API key.
Environment Variables
NadirClaw loads config from ~/.nadirclaw/.env. If that doesn't exist, it falls back to .env in the current directory.
| Variable | Default | Description |
|---|---|---|
NADIRCLAW_SIMPLE_MODEL | gemini-3-flash-preview | Model for simple prompts |
NADIRCLAW_COMPLEX_MODEL | openai-codex/gpt-5.3-codex | Model for complex prompts |
NADIRCLAW_REASONING_MODEL | falls back to complex | Model for reasoning tasks |
NADIRCLAW_FREE_MODEL | falls back to simple | Free/local fallback model |
NADIRCLAW_FALLBACK_CHAIN | all tier models | Comma-separated cascade on failure |
NADIRCLAW_CONFIDENCE_THRESHOLD | 0.06 | Classification threshold (lower = more complex) |
NADIRCLAW_PORT | 8856 | Server port |
NADIRCLAW_AUTH_TOKEN | empty (auth disabled) | Bearer token requirement |
NADIRCLAW_LOG_DIR | ~/.nadirclaw/logs | Log directory |
NADIRCLAW_LOG_RAW | false | Log full raw requests/responses |
NADIRCLAW_DAILY_BUDGET | none | Daily spend limit in USD |
NADIRCLAW_MONTHLY_BUDGET | none | Monthly spend limit in USD |
NADIRCLAW_BUDGET_WARN_THRESHOLD | 0.8 | Alert at this fraction of budget |
NADIRCLAW_BUDGET_WEBHOOK_URL | none | Webhook for budget alerts |
NADIRCLAW_BUDGET_STDOUT_ALERTS | false | Print budget alerts to stdout |
NADIRCLAW_CACHE_TTL | default | Cache time-to-live in seconds |
NADIRCLAW_CACHE_MAX_SIZE | default | Max cache entries |
GEMINI_API_KEY | — | Google Gemini API key |
ANTHROPIC_API_KEY | — | Anthropic API key |
OPENAI_API_KEY | — | OpenAI API key |
OLLAMA_API_BASE | http://localhost:11434 | Ollama base URL |
OTEL_EXPORTER_OTLP_ENDPOINT | empty | OpenTelemetry collector endpoint |
Config File
The primary configuration lives in ~/.nadirclaw/.env. Example:
# ~/.nadirclaw/.env
# API keys
GEMINI_API_KEY=AIza...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Model routing
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro
# Server
NADIRCLAW_PORT=8856
# Budget
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00
Credentials are stored separately in ~/.nadirclaw/credentials.json (managed via nadirclaw auth). Logs go to ~/.nadirclaw/logs/.
Model Setup
Configure which model handles each routing tier:
| Setup | Simple Model | Complex Model | Keys Needed |
|---|---|---|---|
| Gemini + Gemini | gemini-2.5-flash | gemini-2.5-pro | GEMINI_API_KEY |
| Gemini + Claude | gemini-2.5-flash | claude-sonnet-4-5-20250929 | GEMINI_API_KEY + ANTHROPIC_API_KEY |
| Claude + Claude | claude-haiku-4-5-20251001 | claude-sonnet-4-5-20250929 | ANTHROPIC_API_KEY |
| OpenAI + OpenAI | gpt-4.1-mini | gpt-4.1 | OPENAI_API_KEY |
| Fully local | ollama/llama3.1:8b | ollama/qwen3:32b | None |
Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM (100+ providers).
Model Aliases
Use short names instead of full model IDs:
| Alias | Resolves To |
|---|---|
sonnet | claude-sonnet-4-5-20250929 |
opus | claude-opus-4-6-20250918 |
haiku | claude-haiku-4-5-20251001 |
gpt4 | gpt-4.1 |
flash | gemini-2.5-flash |
deepseek | deepseek/deepseek-chat |
llama | ollama/llama3.1:8b |
Authentication
NadirClaw checks credentials in order: OpenClaw stored token → NadirClaw credential → environment variable.
# Add API keys
nadirclaw auth add --provider google --key AIza...
nadirclaw auth add --provider anthropic --key sk-ant-...
nadirclaw auth add --provider openai --key sk-...
# OAuth login (no API key needed)
nadirclaw auth openai login
nadirclaw auth anthropic login
nadirclaw auth gemini login
# Store Claude subscription token
nadirclaw auth setup-token
# Check status
nadirclaw auth status
# Remove
nadirclaw auth remove google
Routing
How Classification Works
NadirClaw uses a binary complexity classifier based on sentence embeddings:
- Pre-computed centroids — two tiny vectors (~1.5 KB each) derived from ~170 seed prompts, shipped with the package.
- Classification — computes the prompt's embedding via
all-MiniLM-L6-v2and measures cosine similarity to both centroids. Closer to complex centroid → complex model. - Borderline handling — when confidence is below threshold (default
0.06), defaults to complex. It's cheaper to over-serve than under-serve.
Routing Tiers
| Tier | When Used | Typical Prompts |
|---|---|---|
| Simple | Prompt closer to simple centroid with confidence above threshold | "What does this function do?", "Format this JSON", "Add a docstring" |
| Complex | Prompt closer to complex centroid, or borderline | "Refactor this module", "Design a caching layer", "Debug this deadlock" |
| Reasoning | 2+ reasoning markers detected ("step by step", "prove that", "analyze tradeoffs") | Mathematical proofs, architecture analysis, critical evaluations |
| Agentic | Tool definitions, tool-role messages, agent system prompts, deep conversations (>10 messages) | Any multi-step agent workflow, coding agent sessions |
Routing Modifiers
After base classification, these overrides apply in order:
- Agentic detection — forces complex when tool definitions, tool-role messages, or agent system prompts are detected
- Reasoning detection — routes to reasoning model when 2+ reasoning markers found
- Context window check — swaps to a model with larger context if conversation exceeds model's limit
- Session persistence — reuses the same model for follow-up messages (30-minute TTL)
Confidence Threshold
The NADIRCLAW_CONFIDENCE_THRESHOLD (default 0.06) controls borderline routing. Lower values route more prompts to complex. Adjust based on your quality tolerance:
# More conservative (routes more to complex)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.03
# More aggressive savings (routes more to simple)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.10
Routing Profiles
Override routing strategy per-request via the model field:
| Profile | Model Field | Behavior |
|---|---|---|
| auto | auto or omit | Smart routing (default) |
| eco | eco | Always use simple model |
| premium | premium | Always use complex model |
| free | free | Use free/local fallback model |
| reasoning | reasoning | Use reasoning model |
# Use eco mode for maximum savings
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "eco", "messages": [{"role": "user", "content": "Hello"}]}'
Fallback Chains
When a model fails (429 rate limit, 5xx error, or timeout), NadirClaw cascades through a configurable chain of fallback models until one succeeds.
# Configure fallback order
NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash
Default behavior: If no fallback chain is configured, NadirClaw uses all your configured tier models. For example, if the simple model hits a 429, it retries once, then tries the complex model.
Rate limit handling: On 429 errors, NadirClaw automatically retries once before moving to the next model in the chain. If all models are exhausted, it returns a friendly error message.
Prompt Caching
NadirClaw includes an in-memory LRU cache for identical chat completions, skipping redundant LLM calls entirely.
# Configure cache
NADIRCLAW_CACHE_TTL=300 # TTL in seconds (default varies)
NADIRCLAW_CACHE_MAX_SIZE=1000 # Max cached entries
Monitor cache:
# CLI
nadirclaw cache
# API endpoint
curl http://localhost:8856/v1/cache
Cache is keyed on the full message content. Streaming requests with identical content will also hit the cache. Only exact matches count — no fuzzy matching.
Claude Code
NadirClaw works as a drop-in proxy for Claude Code:
# Point Claude Code at NadirClaw
export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local
# Start NadirClaw, then use Claude Code normally
nadirclaw serve --verbose
claude
Or use a shell alias:
alias claude-routed='ANTHROPIC_BASE_URL=http://localhost:8856/v1 ANTHROPIC_API_KEY=local claude'
Simple prompts ("read this file", "what does this function do?") route to a cheap model like Gemini Flash. Complex prompts ("refactor this module") stay on Claude. Typical savings: 40–70%.
Using your Claude subscription
# OAuth login (opens browser)
nadirclaw auth anthropic login
# Or store token directly
nadirclaw auth setup-token
OpenClaw
# Auto-configure OpenClaw to use NadirClaw
nadirclaw openclaw onboard
# Start the router
nadirclaw serve
This writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. OpenClaw auto-reloads — no restart needed.
The generated config:
{
"models": {
"providers": {
"nadirclaw": {
"baseUrl": "http://localhost:8856/v1",
"apiKey": "local",
"api": "openai-completions",
"models": [{ "id": "auto", "name": "auto" }]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "nadirclaw/auto" }
}
}
}
Codex
# Auto-configure Codex
nadirclaw codex onboard
# Start the router
nadirclaw serve
This writes ~/.codex/config.toml:
model_provider = "nadirclaw"
[model_providers.nadirclaw]
base_url = "http://localhost:8856/v1"
api_key = "local"
OpenAI OAuth
# Use ChatGPT subscription instead of API key
nadirclaw auth openai login
Any OpenAI-Compatible Client
NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:
# Base URL: http://localhost:8856/v1
# Model: "auto" (or omit)
# API Key: "local" (or anything — auth disabled by default)
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8856/v1",
api_key="local",
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.choices[0].message.content)
curl
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": true
}'
Works with Continue, aider, Cursor, or any tool that speaks the OpenAI chat completions API. Just set the base URL to http://localhost:8856/v1.
CLI Reference
nadirclaw serve
Start the router server.
nadirclaw serve [OPTIONS]
Options:
--port INTEGER Port (default: 8856)
--simple-model TEXT Model for simple prompts
--complex-model TEXT Model for complex prompts
--token TEXT Auth token
--verbose Debug logging
--log-raw Log full raw requests/responses to JSONL
nadirclaw setup
Interactive setup wizard — guides you through providers, API keys, and model selection.
nadirclaw setup
nadirclaw classify
Classify a prompt locally without running the server:
$ nadirclaw classify "What is 2+2?"
Tier: simple
Confidence: 0.2848
Score: 0.0000
Model: gemini-3-flash-preview
$ nadirclaw classify "Design a distributed system for real-time trading"
Tier: complex
Confidence: 0.1843
Score: 1.0000
Model: gemini-2.5-pro
nadirclaw report
Analyze request logs:
nadirclaw report # full report
nadirclaw report --since 24h # last 24 hours
nadirclaw report --since 7d # last 7 days
nadirclaw report --model gemini # filter by model
nadirclaw report --format json # machine-readable JSON
nadirclaw report --export report.txt # save to file
nadirclaw savings
Show cost savings with monthly projections:
nadirclaw savings
nadirclaw savings --since 7d
nadirclaw dashboard
Live terminal dashboard with real-time stats. Also available as web UI at http://localhost:8856/dashboard.
pip install nadirclaw[dashboard]
nadirclaw dashboard
nadirclaw status
Show current config, credentials, and server status:
$ nadirclaw status
NadirClaw Status
----------------------------------------
Simple model: gemini-3-flash-preview
Complex model: gemini-2.5-pro
Port: 8856
Threshold: 0.06
Server: RUNNING (ok)
Other Commands
nadirclaw auth add/status/remove # Manage credentials
nadirclaw auth openai login # OAuth login
nadirclaw codex onboard # Configure Codex integration
nadirclaw openclaw onboard # Configure OpenClaw integration
nadirclaw ollama discover # Auto-discover Ollama instances
nadirclaw cache # View cache stats
nadirclaw build-centroids # Regenerate centroid vectors
Budget & Cost Tracking
NadirClaw tracks per-request costs in real time and supports budget limits with alerts.
Setting Budgets
# In ~/.nadirclaw/.env
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00
NADIRCLAW_BUDGET_WARN_THRESHOLD=0.8 # Alert at 80% of budget
Alerts
When spend crosses the warning threshold, NadirClaw can:
- Webhook: POST a JSON payload to
NADIRCLAW_BUDGET_WEBHOOK_URL - Stdout: Print alerts if
NADIRCLAW_BUDGET_STDOUT_ALERTS=true
Reporting
# See savings
nadirclaw savings
# Detailed report with cost breakdown
nadirclaw report --since 7d
# Live monitoring
nadirclaw dashboard
Reports include: total requests, tier distribution, per-model usage and tokens, latency percentiles (p50/p95), fallback counts, and error rates.
Docker
NadirClaw + Ollama (fully local, zero cost)
git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up
This starts Ollama and NadirClaw on port 8856. Pull a model:
docker compose exec ollama ollama pull llama3.1:8b
With cloud providers
Create a .env file with API keys and model config (see .env.example), then restart:
# .env
GEMINI_API_KEY=AIza...
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro
Standalone (no Ollama)
docker build -t nadirclaw .
docker run -p 8856:8856 --env-file .env nadirclaw
API Reference
Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | OpenAI-compatible completions with auto routing (supports stream: true) |
/v1/classify | POST | Classify a prompt without calling an LLM |
/v1/classify/batch | POST | Classify multiple prompts at once |
/v1/models | GET | List available models |
/v1/logs | GET | View recent request logs |
/v1/cache | GET | Cache stats |
/health | GET | Health check (no auth) |
/dashboard | GET | Web dashboard UI |
Chat Completions Request
POST /v1/chat/completions
{
"model": "auto", // or "eco", "premium", "free", "reasoning", or a model alias
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
],
"stream": true, // optional, SSE streaming
"temperature": 0.7, // optional, passed through to provider
"tools": [...] // optional, triggers agentic detection
}
Classify Request
POST /v1/classify
{
"messages": [
{"role": "user", "content": "What is 2+2?"}
]
}
// Response:
{
"tier": "simple",
"confidence": 0.2848,
"score": 0.0,
"model": "gemini-3-flash-preview"
}
Troubleshooting
First request is slow (2–3 seconds)
Normal — NadirClaw downloads the sentence embedding model (~80 MB) on first use. Subsequent requests classify in ~10ms.
Port 8856 already in use
# Use a different port
nadirclaw serve --port 9000
# Or set in env
NADIRCLAW_PORT=9000
Model returning errors
- Check
nadirclaw auth statusto verify credentials - Run with
--verbosefor detailed error messages - Ensure your API key has access to the configured models
Ollama not found
# Auto-discover Ollama instances
nadirclaw ollama discover
# Or set manually
OLLAMA_API_BASE=http://192.168.1.100:11434 nadirclaw serve
Too many prompts routed to complex
Raise the confidence threshold to route more to simple:
NADIRCLAW_CONFIDENCE_THRESHOLD=0.10
Too many prompts routed to simple (quality issues)
Lower the confidence threshold:
NADIRCLAW_CONFIDENCE_THRESHOLD=0.03
Streaming not working
NadirClaw supports full SSE streaming. Ensure your request includes "stream": true and your client handles SSE format. Check that you're not behind a reverse proxy that buffers responses.
Rate limits (429 errors)
NadirClaw handles these automatically — retries once, then falls through the fallback chain. If all models are exhausted, configure additional fallbacks:
NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash
Viewing logs
# Request logs
ls ~/.nadirclaw/logs/
# Full raw logging (for debugging)
nadirclaw serve --log-raw
# Analyze logs
nadirclaw report --since 24h