NadirClaw Documentation
Open-source LLM router that classifies prompt complexity and routes to the optimal model. Cut AI API costs by 40–70% with zero code changes.
Installation
pip (recommended)
pip install nadirclaw
One-line install script
curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | sh
This clones to ~/.nadirclaw, creates a venv, installs deps, and adds nadirclaw to your PATH. Run it again to update.
From source
git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
python3 -m venv venv
source venv/bin/activate
pip install -e .
Docker
git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up
Uninstall
rm -rf ~/.nadirclaw
sudo rm -f /usr/local/bin/nadirclaw
Quick Start
# Install
pip install nadirclaw
# Interactive setup (providers, API keys, models)
nadirclaw setup
# Start the router
nadirclaw serve --verbose
NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini Flash for simple, Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.
First Run
On first request, NadirClaw downloads the all-MiniLM-L6-v2 sentence embedding model (~80 MB). This takes 2–3 seconds. Subsequent requests classify in ~10ms.
Prerequisites: Python 3.10+ and at least one LLM provider — a Gemini API key (free tier available), Ollama running locally (free), or any cloud provider API key.
Environment Variables
NadirClaw loads config from ~/.nadirclaw/.env. If that doesn't exist, it falls back to .env in the current directory.
| Variable | Default | Description |
|---|---|---|
NADIRCLAW_SIMPLE_MODEL | gemini-3-flash-preview | Model for simple prompts |
NADIRCLAW_COMPLEX_MODEL | openai-codex/gpt-5.3-codex | Model for complex prompts |
NADIRCLAW_REASONING_MODEL | falls back to complex | Model for reasoning tasks |
NADIRCLAW_MID_MODEL | none (enables 3-tier) | Model for mid-complexity prompts |
NADIRCLAW_FREE_MODEL | falls back to simple | Free/local fallback model |
NADIRCLAW_TIER_THRESHOLDS | 0.35,0.65 | Score thresholds for simple/mid/complex boundaries |
NADIRCLAW_FALLBACK_CHAIN | all tier models | Comma-separated global cascade on failure |
NADIRCLAW_SIMPLE_FALLBACK | none | Per-tier fallback for simple model failures |
NADIRCLAW_COMPLEX_FALLBACK | none | Per-tier fallback for complex model failures |
NADIRCLAW_CONFIDENCE_THRESHOLD | 0.06 | Classification threshold (lower = more complex) |
NADIRCLAW_PORT | 8856 | Server port |
NADIRCLAW_AUTH_TOKEN | empty (auth disabled) | Bearer token requirement |
NADIRCLAW_LOG_DIR | ~/.nadirclaw/logs | Log directory |
NADIRCLAW_LOG_RAW | false | Log full raw requests/responses |
NADIRCLAW_DAILY_BUDGET | none | Daily spend limit in USD |
NADIRCLAW_MONTHLY_BUDGET | none | Monthly spend limit in USD |
NADIRCLAW_BUDGET_WARN_THRESHOLD | 0.8 | Alert at this fraction of budget |
NADIRCLAW_BUDGET_WEBHOOK_URL | none | Webhook for budget alerts |
NADIRCLAW_BUDGET_STDOUT_ALERTS | false | Print budget alerts to stdout |
NADIRCLAW_CACHE_TTL | default | Cache time-to-live in seconds |
NADIRCLAW_CACHE_MAX_SIZE | default | Max cache entries |
NADIRCLAW_CACHE_ENABLED | true | Enable/disable prompt cache |
NADIRCLAW_MODEL_RATE_LIMITS | none | Per-model RPM limits (e.g. gemini-3-flash-preview=30,gpt-4.1=60) |
NADIRCLAW_DEFAULT_MODEL_RPM | 0 (unlimited) | Default RPM limit for all models |
NADIRCLAW_API_BASE | none | Custom OpenAI-compatible endpoint (vLLM, LocalAI, etc.) |
GEMINI_API_KEY | — | Google Gemini API key |
ANTHROPIC_API_KEY | — | Anthropic API key |
OPENAI_API_KEY | — | OpenAI API key |
OLLAMA_API_BASE | http://localhost:11434 | Ollama base URL |
OTEL_EXPORTER_OTLP_ENDPOINT | empty | OpenTelemetry collector endpoint |
Config File
The primary configuration lives in ~/.nadirclaw/.env. Example:
# ~/.nadirclaw/.env
# API keys
GEMINI_API_KEY=AIza...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Model routing
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro
# Server
NADIRCLAW_PORT=8856
# Budget
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00
Credentials are stored separately in ~/.nadirclaw/credentials.json (managed via nadirclaw auth). Logs go to ~/.nadirclaw/logs/.
Model Setup
Configure which model handles each routing tier:
| Setup | Simple Model | Complex Model | Keys Needed |
|---|---|---|---|
| Gemini + Gemini | gemini-2.5-flash | gemini-2.5-pro | GEMINI_API_KEY |
| Gemini + Claude | gemini-2.5-flash | claude-sonnet-4-5-20250929 | GEMINI_API_KEY + ANTHROPIC_API_KEY |
| Claude + Claude | claude-haiku-4-5-20251001 | claude-sonnet-4-5-20250929 | ANTHROPIC_API_KEY |
| OpenAI + OpenAI | gpt-4.1-mini | gpt-4.1 | OPENAI_API_KEY |
| Fully local | ollama/llama3.1:8b | ollama/qwen3:32b | None |
Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM (100+ providers).
Model Aliases
Use short names instead of full model IDs:
| Alias | Resolves To |
|---|---|
sonnet | claude-sonnet-4-5-20250929 |
opus | claude-opus-4-6-20250918 |
haiku | claude-haiku-4-5-20251001 |
gpt4 | gpt-4.1 |
gpt5 | gpt-5.2 |
flash | gemini-2.5-flash |
gemini-pro | gemini-2.5-pro |
deepseek | deepseek/deepseek-chat |
deepseek-r1 | deepseek/deepseek-reasoner |
llama | ollama/llama3.1:8b |
Authentication
NadirClaw checks credentials in order: OpenClaw stored token → NadirClaw credential → environment variable.
# Add API keys
nadirclaw auth add --provider google --key AIza...
nadirclaw auth add --provider anthropic --key sk-ant-...
nadirclaw auth add --provider openai --key sk-...
# OAuth login (no API key needed)
nadirclaw auth openai login
nadirclaw auth anthropic login
nadirclaw auth gemini login
# Store Claude subscription token
nadirclaw auth setup-token
# Check status
nadirclaw auth status
# Remove
nadirclaw auth remove google
Routing
How Classification Works
NadirClaw uses a binary complexity classifier based on sentence embeddings:
- Pre-computed centroids — two tiny vectors (~1.5 KB each) derived from ~170 seed prompts, shipped with the package.
- Classification — computes the prompt's embedding via
all-MiniLM-L6-v2and measures cosine similarity to both centroids. Closer to complex centroid → complex model. - Borderline handling — when confidence is below threshold (default
0.06), defaults to complex. It's cheaper to over-serve than under-serve.
Routing Tiers
| Tier | When Used | Typical Prompts |
|---|---|---|
| Simple | Prompt closer to simple centroid with confidence above threshold | "What does this function do?", "Format this JSON", "Add a docstring" |
| Mid | Score between tier thresholds (requires NADIRCLAW_MID_MODEL) | "Write a unit test for this function", "Explain this error" |
| Complex | Prompt closer to complex centroid, or borderline | "Refactor this module", "Design a caching layer", "Debug this deadlock" |
| Reasoning | 2+ reasoning markers detected ("step by step", "prove that", "analyze tradeoffs") | Mathematical proofs, architecture analysis, critical evaluations |
| Agentic | Tool definitions, tool-role messages, agent system prompts, deep conversations (>10 messages) | Any multi-step agent workflow, coding agent sessions |
| Vision | Image content (image_url) detected in messages | Screenshot analysis, diagram reading, image-based questions |
Routing Modifiers
After base classification, these overrides apply in order:
- Agentic detection — forces complex when tool definitions, tool-role messages, or agent system prompts are detected
- Reasoning detection — routes to reasoning model when 2+ reasoning markers found
- Vision detection — swaps to a vision-capable model (GPT-4o, Claude, Gemini) when
image_urlcontent is detected - Context window check — swaps to a model with larger context if conversation exceeds model's limit
- Session persistence — reuses the same model for follow-up messages (30-minute TTL)
Confidence Threshold
The NADIRCLAW_CONFIDENCE_THRESHOLD (default 0.06) controls borderline routing. Lower values route more prompts to complex. Adjust based on your quality tolerance:
# More conservative (routes more to complex)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.03
# More aggressive savings (routes more to simple)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.10
Routing Profiles
Override routing strategy per-request via the model field:
| Profile | Model Field | Behavior |
|---|---|---|
| auto | auto or omit | Smart routing (default) |
| eco | eco | Always use simple model |
| premium | premium | Always use complex model |
| free | free | Use free/local fallback model |
| reasoning | reasoning | Use reasoning model |
# Use eco mode for maximum savings
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "eco", "messages": [{"role": "user", "content": "Hello"}]}'
Three-Tier Routing
By default, NadirClaw uses binary routing (simple/complex). Enable three-tier routing by setting a mid model:
# Enable three-tier routing
NADIRCLAW_MID_MODEL=gpt-4.1-mini
# Customize tier boundaries (default: 0.35,0.65)
NADIRCLAW_TIER_THRESHOLDS=0.35,0.65
With three-tier routing, classification scores map to tiers:
| Score Range | Tier | Model |
|---|---|---|
| 0.00 – 0.35 | Simple | NADIRCLAW_SIMPLE_MODEL |
| 0.35 – 0.65 | Mid | NADIRCLAW_MID_MODEL |
| 0.65 – 1.00 | Complex | NADIRCLAW_COMPLEX_MODEL |
If NADIRCLAW_MID_MODEL is not set, NadirClaw falls back to binary routing (simple/complex).
Fallback Chains
When a model fails (429 rate limit, 5xx error, or timeout), NadirClaw cascades through a configurable chain of fallback models until one succeeds.
# Configure fallback order
NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash
Per-tier fallback chains
You can also configure fallback chains per tier for more granular control:
# Per-tier fallback chains
NADIRCLAW_SIMPLE_FALLBACK=gemini-2.5-flash,gemini-3-flash-preview
NADIRCLAW_COMPLEX_FALLBACK=gpt-4.1,claude-sonnet-4-5-20250929
NADIRCLAW_MID_FALLBACK=gpt-4.1-mini,gemini-2.5-flash
Default behavior: If no fallback chain is configured, NadirClaw uses all your configured tier models: [COMPLEX_MODEL, MID_MODEL, SIMPLE_MODEL, REASONING_MODEL, FREE_MODEL].
Rate limit handling: On 429 errors, NadirClaw automatically retries once before moving to the next model in the chain. If all models are exhausted, it returns a friendly error message.
Rate Limiting
Configure per-model request rate limits to stay within provider quotas:
# Per-model RPM limits
NADIRCLAW_MODEL_RATE_LIMITS=gemini-3-flash-preview=30,gpt-4.1=60
# Default RPM for all models (0 = unlimited)
NADIRCLAW_DEFAULT_MODEL_RPM=0
When a model hits its RPM limit, NadirClaw automatically triggers the fallback chain rather than returning an error. Monitor rate limit status via the API:
curl http://localhost:8856/v1/rate-limits
Prompt Caching
NadirClaw includes an in-memory LRU cache for identical chat completions, skipping redundant LLM calls entirely.
# Configure cache
NADIRCLAW_CACHE_TTL=300 # TTL in seconds (default varies)
NADIRCLAW_CACHE_MAX_SIZE=1000 # Max cached entries
Monitor cache:
# CLI
nadirclaw cache
# API endpoint
curl http://localhost:8856/v1/cache
Cache is keyed on the full message content. Streaming requests with identical content will also hit the cache. Only exact matches count — no fuzzy matching.
Claude Code
NadirClaw works as a drop-in proxy for Claude Code:
# Point Claude Code at NadirClaw
export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local
# Start NadirClaw, then use Claude Code normally
nadirclaw serve --verbose
claude
Or use a shell alias:
alias claude-routed='ANTHROPIC_BASE_URL=http://localhost:8856/v1 ANTHROPIC_API_KEY=local claude'
Simple prompts ("read this file", "what does this function do?") route to a cheap model like Gemini Flash. Complex prompts ("refactor this module") stay on Claude. Typical savings: 40–70%.
Using your Claude subscription
# OAuth login (opens browser)
nadirclaw auth anthropic login
# Or store token directly
nadirclaw auth setup-token
OpenClaw
# Auto-configure OpenClaw to use NadirClaw
nadirclaw openclaw onboard
# Start the router
nadirclaw serve
This writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. OpenClaw auto-reloads — no restart needed.
The generated config:
{
"models": {
"providers": {
"nadirclaw": {
"baseUrl": "http://localhost:8856/v1",
"apiKey": "local",
"api": "openai-completions",
"models": [{ "id": "auto", "name": "auto" }]
}
}
},
"agents": {
"defaults": {
"model": { "primary": "nadirclaw/auto" }
}
}
}
Codex
# Auto-configure Codex
nadirclaw codex onboard
# Start the router
nadirclaw serve
This writes ~/.codex/config.toml:
model_provider = "nadirclaw"
[model_providers.nadirclaw]
base_url = "http://localhost:8856/v1"
api_key = "local"
OpenAI OAuth
# Use ChatGPT subscription instead of API key
nadirclaw auth openai login
Continue
# Auto-configure Continue
nadirclaw continue onboard
This writes NadirClaw as a provider in ~/.continue/config.json. Continue reads the config on startup.
Cursor
# Show setup instructions
nadirclaw cursor onboard
In Cursor: Settings → Models → OpenAI API Key: local, Base URL: http://localhost:8856/v1. Select a model or use auto.
Open WebUI
# Show setup instructions
nadirclaw openwebui onboard
In Open WebUI: Admin Settings → Connections → OpenAI → Add Connection. Set URL to http://localhost:8856/v1 and API Key to local. Open WebUI auto-discovers NadirClaw's routing profiles and tier models via /v1/models.
Any OpenAI-Compatible Client
NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:
# Base URL: http://localhost:8856/v1
# Model: "auto" (or omit)
# API Key: "local" (or anything — auth disabled by default)
Python (openai SDK)
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8856/v1",
api_key="local",
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.choices[0].message.content)
curl
curl http://localhost:8856/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "What is 2+2?"}],
"stream": true
}'
Works with Continue, Aider, Cursor, Windsurf, or any tool that speaks the OpenAI chat completions API. Just set the base URL to http://localhost:8856/v1.
Custom Endpoints (vLLM, LocalAI, LM Studio)
NadirClaw can route to any OpenAI-compatible endpoint:
NADIRCLAW_API_BASE=http://your-server:8000/v1 \
NADIRCLAW_SIMPLE_MODEL=openai/your-small-model \
NADIRCLAW_COMPLEX_MODEL=openai/your-large-model \
nadirclaw serve --verbose
CLI Reference
nadirclaw serve
Start the router server.
nadirclaw serve [OPTIONS]
Options:
--port INTEGER Port (default: 8856)
--simple-model TEXT Model for simple prompts
--complex-model TEXT Model for complex prompts
--token TEXT Auth token
--verbose Debug logging
--log-raw Log full raw requests/responses to JSONL
nadirclaw setup
Interactive setup wizard — guides you through providers, API keys, and model selection.
nadirclaw setup
nadirclaw classify
Classify a prompt locally without running the server:
$ nadirclaw classify "What is 2+2?"
Tier: simple
Confidence: 0.2848
Score: 0.0000
Model: gemini-3-flash-preview
$ nadirclaw classify "Design a distributed system for real-time trading"
Tier: complex
Confidence: 0.1843
Score: 1.0000
Model: gemini-2.5-pro
nadirclaw report
Analyze request logs:
nadirclaw report # full report
nadirclaw report --since 24h # last 24 hours
nadirclaw report --since 7d # last 7 days
nadirclaw report --model gemini # filter by model
nadirclaw report --format json # machine-readable JSON
nadirclaw report --export report.txt # save to file
nadirclaw savings
Show cost savings with monthly projections:
nadirclaw savings
nadirclaw savings --since 7d
nadirclaw dashboard
Live terminal dashboard with real-time stats. Also available as web UI at http://localhost:8856/dashboard.
pip install nadirclaw[dashboard]
nadirclaw dashboard
nadirclaw status
Show current config, credentials, and server status:
$ nadirclaw status
NadirClaw Status
----------------------------------------
Simple model: gemini-3-flash-preview
Complex model: gemini-2.5-pro
Port: 8856
Threshold: 0.06
Server: RUNNING (ok)
nadirclaw test
Probe each configured model to verify credentials and connectivity. CI-friendly (exits 1 on failure):
nadirclaw test
nadirclaw test --simple-model gemini-2.5-flash --complex-model gpt-4.1
nadirclaw export
Export request logs for offline analysis:
nadirclaw export --format csv --since 7d
nadirclaw export --format jsonl
nadirclaw budget
Show real-time budget status and alerts:
nadirclaw budget
Other Commands
nadirclaw auth add/status/remove # Manage credentials
nadirclaw auth openai login # OAuth login (ChatGPT subscription)
nadirclaw auth anthropic login # OAuth login (Claude subscription)
nadirclaw auth gemini login # OAuth login (Google Gemini)
nadirclaw auth setup-token # Store Claude subscription token
nadirclaw codex onboard # Configure Codex integration
nadirclaw openclaw onboard # Configure OpenClaw integration
nadirclaw continue onboard # Configure Continue integration
nadirclaw cursor onboard # Show Cursor setup instructions
nadirclaw openwebui onboard # Show Open WebUI setup
nadirclaw ollama discover # Auto-discover Ollama instances
nadirclaw ollama discover --scan-network # Network-wide scan
nadirclaw cache # View cache stats
nadirclaw build-centroids # Regenerate centroid vectors
Budget & Cost Tracking
NadirClaw tracks per-request costs in real time and supports budget limits with alerts.
Setting Budgets
# In ~/.nadirclaw/.env
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00
NADIRCLAW_BUDGET_WARN_THRESHOLD=0.8 # Alert at 80% of budget
Alerts
When spend crosses the warning threshold, NadirClaw can:
- Webhook: POST a JSON payload to
NADIRCLAW_BUDGET_WEBHOOK_URL - Stdout: Print alerts if
NADIRCLAW_BUDGET_STDOUT_ALERTS=true
Reporting
# See savings
nadirclaw savings
# Detailed report with cost breakdown
nadirclaw report --since 7d
# Live monitoring
nadirclaw dashboard
Reports include: total requests, tier distribution, per-model usage and tokens, latency percentiles (p50/p95), fallback counts, and error rates.
Prometheus Metrics
NadirClaw exposes a /metrics endpoint in Prometheus format with zero extra dependencies.
curl http://localhost:8856/metrics
| Metric | Type | Labels | Description |
|---|---|---|---|
nadirclaw_requests_total | counter | model, tier, status | Total completed LLM requests |
nadirclaw_tokens_prompt_total | counter | model | Total prompt tokens |
nadirclaw_tokens_completion_total | counter | model | Total completion tokens |
nadirclaw_cost_dollars_total | counter | model | Estimated cost in USD |
nadirclaw_request_latency_ms | histogram | model, tier | Request latency distribution |
nadirclaw_cache_hits_total | counter | — | Prompt cache hits |
nadirclaw_fallbacks_total | counter | from_model, to_model | Fallback events |
nadirclaw_errors_total | counter | model, error_type | Request errors |
nadirclaw_uptime_seconds | gauge | — | Seconds since start |
Add NadirClaw as a Prometheus scrape target:
# prometheus.yml
scrape_configs:
- job_name: nadirclaw
static_configs:
- targets: ['localhost:8856']
OpenTelemetry Tracing
Optional distributed tracing with GenAI semantic conventions. Install the telemetry extra:
pip install nadirclaw[telemetry]
# Point to your collector
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 nadirclaw serve
Emitted spans:
smart_route_analysis— classifier decision (tier, confidence, model selected)dispatch_model— LLM provider call (model, tokens, latency)chat_completion— full request lifecycle
Includes GenAI semantic conventions plus custom nadirclaw.* attributes for routing metadata.
Docker
NadirClaw + Ollama (fully local, zero cost)
git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up
This starts Ollama and NadirClaw on port 8856. Pull a model:
docker compose exec ollama ollama pull llama3.1:8b
With cloud providers
Create a .env file with API keys and model config (see .env.example), then restart:
# .env
GEMINI_API_KEY=AIza...
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro
Standalone (no Ollama)
docker build -t nadirclaw .
docker run -p 8856:8856 --env-file .env nadirclaw
API Reference
Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.
| Endpoint | Method | Description |
|---|---|---|
/v1/chat/completions | POST | OpenAI-compatible completions with auto routing (supports stream: true) |
/v1/classify | POST | Classify a prompt without calling an LLM |
/v1/classify/batch | POST | Classify multiple prompts at once |
/v1/models | GET | List available models |
/v1/logs | GET | View recent request logs |
/v1/cache | GET | Cache stats |
/v1/rate-limits | GET | Per-model rate limit status |
/v1/budget | GET | Budget status and alerts |
/metrics | GET | Prometheus metrics |
/health | GET | Health check (no auth) |
/dashboard | GET | Web dashboard UI |
Chat Completions Request
POST /v1/chat/completions
{
"model": "auto", // or "eco", "premium", "free", "reasoning", or a model alias
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is 2+2?"}
],
"stream": true, // optional, SSE streaming
"temperature": 0.7, // optional, passed through to provider
"tools": [...] // optional, triggers agentic detection
}
Classify Request
POST /v1/classify
{
"messages": [
{"role": "user", "content": "What is 2+2?"}
]
}
// Response:
{
"tier": "simple",
"confidence": 0.2848,
"score": 0.0,
"model": "gemini-3-flash-preview"
}
Troubleshooting
First request is slow (2–3 seconds)
Normal — NadirClaw downloads the sentence embedding model (~80 MB) on first use. Subsequent requests classify in ~10ms.
Port 8856 already in use
# Use a different port
nadirclaw serve --port 9000
# Or set in env
NADIRCLAW_PORT=9000
Model returning errors
- Check
nadirclaw auth statusto verify credentials - Run with
--verbosefor detailed error messages - Ensure your API key has access to the configured models
Ollama not found
# Auto-discover Ollama instances
nadirclaw ollama discover
# Or set manually
OLLAMA_API_BASE=http://192.168.1.100:11434 nadirclaw serve
Too many prompts routed to complex
Raise the confidence threshold to route more to simple:
NADIRCLAW_CONFIDENCE_THRESHOLD=0.10
Too many prompts routed to simple (quality issues)
Lower the confidence threshold:
NADIRCLAW_CONFIDENCE_THRESHOLD=0.03
Streaming not working
NadirClaw supports full SSE streaming. Ensure your request includes "stream": true and your client handles SSE format. Check that you're not behind a reverse proxy that buffers responses.
Rate limits (429 errors)
NadirClaw handles these automatically — retries once, then falls through the fallback chain. If all models are exhausted, configure additional fallbacks:
NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash
Viewing logs
# Request logs
ls ~/.nadirclaw/logs/
# Full raw logging (for debugging)
nadirclaw serve --log-raw
# Analyze logs
nadirclaw report --since 24h