NadirClaw Documentation

Open-source LLM router that classifies prompt complexity and routes to the optimal model. Cut AI API costs by 40–70% with zero code changes.

Installation

pip (recommended)

pip install nadirclaw

One-line install script

curl -fsSL https://raw.githubusercontent.com/doramirdor/NadirClaw/main/install.sh | sh

This clones to ~/.nadirclaw, creates a venv, installs deps, and adds nadirclaw to your PATH. Run it again to update.

From source

git clone https://github.com/doramirdor/NadirClaw.git
cd NadirClaw
python3 -m venv venv
source venv/bin/activate
pip install -e .

Docker

git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up

Uninstall

rm -rf ~/.nadirclaw
sudo rm -f /usr/local/bin/nadirclaw

Quick Start

# Install
pip install nadirclaw

# Interactive setup (providers, API keys, models)
nadirclaw setup

# Start the router
nadirclaw serve --verbose

NadirClaw starts on http://localhost:8856 with sensible defaults (Gemini Flash for simple, Codex for complex). If you skip nadirclaw setup, the serve command will offer to run it on first launch.

First Run

On first request, NadirClaw downloads the all-MiniLM-L6-v2 sentence embedding model (~80 MB). This takes 2–3 seconds. Subsequent requests classify in ~10ms.

Prerequisites: Python 3.10+ and at least one LLM provider — a Gemini API key (free tier available), Ollama running locally (free), or any cloud provider API key.

Environment Variables

NadirClaw loads config from ~/.nadirclaw/.env. If that doesn't exist, it falls back to .env in the current directory.

Variable	Default	Description
`NADIRCLAW_SIMPLE_MODEL`	`gemini-3-flash-preview`	Model for simple prompts
`NADIRCLAW_COMPLEX_MODEL`	`openai-codex/gpt-5.3-codex`	Model for complex prompts
`NADIRCLAW_REASONING_MODEL`	falls back to complex	Model for reasoning tasks
`NADIRCLAW_MID_MODEL`	none (enables 3-tier)	Model for mid-complexity prompts
`NADIRCLAW_FREE_MODEL`	falls back to simple	Free/local fallback model
`NADIRCLAW_TIER_THRESHOLDS`	`0.35,0.65`	Score thresholds for simple/mid/complex boundaries
`NADIRCLAW_FALLBACK_CHAIN`	all tier models	Comma-separated global cascade on failure
`NADIRCLAW_SIMPLE_FALLBACK`	none	Per-tier fallback for simple model failures
`NADIRCLAW_COMPLEX_FALLBACK`	none	Per-tier fallback for complex model failures
`NADIRCLAW_CONFIDENCE_THRESHOLD`	`0.06`	Classification threshold (lower = more complex)
`NADIRCLAW_PORT`	`8856`	Server port
`NADIRCLAW_AUTH_TOKEN`	empty (auth disabled)	Bearer token requirement
`NADIRCLAW_LOG_DIR`	`~/.nadirclaw/logs`	Log directory
`NADIRCLAW_LOG_RAW`	`false`	Log full raw requests/responses
`NADIRCLAW_DAILY_BUDGET`	none	Daily spend limit in USD
`NADIRCLAW_MONTHLY_BUDGET`	none	Monthly spend limit in USD
`NADIRCLAW_BUDGET_WARN_THRESHOLD`	`0.8`	Alert at this fraction of budget
`NADIRCLAW_BUDGET_WEBHOOK_URL`	none	Webhook for budget alerts
`NADIRCLAW_BUDGET_STDOUT_ALERTS`	`false`	Print budget alerts to stdout
`NADIRCLAW_CACHE_TTL`	default	Cache time-to-live in seconds
`NADIRCLAW_CACHE_MAX_SIZE`	default	Max cache entries
`NADIRCLAW_CACHE_ENABLED`	`true`	Enable/disable prompt cache
`NADIRCLAW_MODEL_RATE_LIMITS`	none	Per-model RPM limits (e.g. `gemini-3-flash-preview=30,gpt-4.1=60`)
`NADIRCLAW_DEFAULT_MODEL_RPM`	`0` (unlimited)	Default RPM limit for all models
`NADIRCLAW_API_BASE`	none	Custom OpenAI-compatible endpoint (vLLM, LocalAI, etc.)
`GEMINI_API_KEY`	—	Google Gemini API key
`ANTHROPIC_API_KEY`	—	Anthropic API key
`OPENAI_API_KEY`	—	OpenAI API key
`OLLAMA_API_BASE`	`http://localhost:11434`	Ollama base URL
`OTEL_EXPORTER_OTLP_ENDPOINT`	empty	OpenTelemetry collector endpoint

Config File

The primary configuration lives in ~/.nadirclaw/.env. Example:

# ~/.nadirclaw/.env

# API keys
GEMINI_API_KEY=AIza...
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Model routing
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro

# Server
NADIRCLAW_PORT=8856

# Budget
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00

Credentials are stored separately in ~/.nadirclaw/credentials.json (managed via nadirclaw auth). Logs go to ~/.nadirclaw/logs/.

Model Setup

Configure which model handles each routing tier:

Setup	Simple Model	Complex Model	Keys Needed
Gemini + Gemini	`gemini-2.5-flash`	`gemini-2.5-pro`	`GEMINI_API_KEY`
Gemini + Claude	`gemini-2.5-flash`	`claude-sonnet-4-5-20250929`	`GEMINI_API_KEY` + `ANTHROPIC_API_KEY`
Claude + Claude	`claude-haiku-4-5-20251001`	`claude-sonnet-4-5-20250929`	`ANTHROPIC_API_KEY`
OpenAI + OpenAI	`gpt-4.1-mini`	`gpt-4.1`	`OPENAI_API_KEY`
Fully local	`ollama/llama3.1:8b`	`ollama/qwen3:32b`	None

Gemini models are called natively via the Google GenAI SDK. All other models go through LiteLLM (100+ providers).

Model Aliases

Use short names instead of full model IDs:

Alias	Resolves To
`sonnet`	`claude-sonnet-4-5-20250929`
`opus`	`claude-opus-4-6-20250918`
`haiku`	`claude-haiku-4-5-20251001`
`gpt4`	`gpt-4.1`
`gpt5`	`gpt-5.2`
`flash`	`gemini-2.5-flash`
`gemini-pro`	`gemini-2.5-pro`
`deepseek`	`deepseek/deepseek-chat`
`deepseek-r1`	`deepseek/deepseek-reasoner`
`llama`	`ollama/llama3.1:8b`

Authentication

NadirClaw checks credentials in order: OpenClaw stored token → NadirClaw credential → environment variable.

# Add API keys
nadirclaw auth add --provider google --key AIza...
nadirclaw auth add --provider anthropic --key sk-ant-...
nadirclaw auth add --provider openai --key sk-...

# OAuth login (no API key needed)
nadirclaw auth openai login
nadirclaw auth anthropic login
nadirclaw auth gemini login

# Store Claude subscription token
nadirclaw auth setup-token

# Check status
nadirclaw auth status

# Remove
nadirclaw auth remove google

Routing

How Classification Works

NadirClaw uses a binary complexity classifier based on sentence embeddings:

Pre-computed centroids — two tiny vectors (~1.5 KB each) derived from ~170 seed prompts, shipped with the package.
Classification — computes the prompt's embedding via all-MiniLM-L6-v2 and measures cosine similarity to both centroids. Closer to complex centroid → complex model.
Borderline handling — when confidence is below threshold (default 0.06), defaults to complex. It's cheaper to over-serve than under-serve.

Routing Tiers

Tier	When Used	Typical Prompts
Simple	Prompt closer to simple centroid with confidence above threshold	"What does this function do?", "Format this JSON", "Add a docstring"
Mid	Score between tier thresholds (requires `NADIRCLAW_MID_MODEL`)	"Write a unit test for this function", "Explain this error"
Complex	Prompt closer to complex centroid, or borderline	"Refactor this module", "Design a caching layer", "Debug this deadlock"
Reasoning	2+ reasoning markers detected ("step by step", "prove that", "analyze tradeoffs")	Mathematical proofs, architecture analysis, critical evaluations
Agentic	Tool definitions, tool-role messages, agent system prompts, deep conversations (>10 messages)	Any multi-step agent workflow, coding agent sessions
Vision	Image content (`image_url`) detected in messages	Screenshot analysis, diagram reading, image-based questions

Routing Modifiers

After base classification, these overrides apply in order:

Agentic detection — forces complex when tool definitions, tool-role messages, or agent system prompts are detected
Reasoning detection — routes to reasoning model when 2+ reasoning markers found
Vision detection — swaps to a vision-capable model (GPT-4o, Claude, Gemini) when image_url content is detected
Context window check — swaps to a model with larger context if conversation exceeds model's limit
Session persistence — reuses the same model for follow-up messages (30-minute TTL)

Confidence Threshold

The NADIRCLAW_CONFIDENCE_THRESHOLD (default 0.06) controls borderline routing. Lower values route more prompts to complex. Adjust based on your quality tolerance:

# More conservative (routes more to complex)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.03

# More aggressive savings (routes more to simple)
NADIRCLAW_CONFIDENCE_THRESHOLD=0.10

Routing Profiles

Override routing strategy per-request via the model field:

Profile	Model Field	Behavior
auto	`auto` or omit	Smart routing (default)
eco	`eco`	Always use simple model
premium	`premium`	Always use complex model
free	`free`	Use free/local fallback model
reasoning	`reasoning`	Use reasoning model

# Use eco mode for maximum savings
curl http://localhost:8856/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "eco", "messages": [{"role": "user", "content": "Hello"}]}'

Three-Tier Routing

By default, NadirClaw uses binary routing (simple/complex). Enable three-tier routing by setting a mid model:

# Enable three-tier routing
NADIRCLAW_MID_MODEL=gpt-4.1-mini

# Customize tier boundaries (default: 0.35,0.65)
NADIRCLAW_TIER_THRESHOLDS=0.35,0.65

With three-tier routing, classification scores map to tiers:

Score Range	Tier	Model
0.00 – 0.35	Simple	`NADIRCLAW_SIMPLE_MODEL`
0.35 – 0.65	Mid	`NADIRCLAW_MID_MODEL`
0.65 – 1.00	Complex	`NADIRCLAW_COMPLEX_MODEL`

If NADIRCLAW_MID_MODEL is not set, NadirClaw falls back to binary routing (simple/complex).

Fallback Chains

When a model fails (429 rate limit, 5xx error, or timeout), NadirClaw cascades through a configurable chain of fallback models until one succeeds.

# Configure fallback order
NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash

Per-tier fallback chains

You can also configure fallback chains per tier for more granular control:

# Per-tier fallback chains
NADIRCLAW_SIMPLE_FALLBACK=gemini-2.5-flash,gemini-3-flash-preview
NADIRCLAW_COMPLEX_FALLBACK=gpt-4.1,claude-sonnet-4-5-20250929
NADIRCLAW_MID_FALLBACK=gpt-4.1-mini,gemini-2.5-flash

Default behavior: If no fallback chain is configured, NadirClaw uses all your configured tier models: [COMPLEX_MODEL, MID_MODEL, SIMPLE_MODEL, REASONING_MODEL, FREE_MODEL].

Rate limit handling: On 429 errors, NadirClaw automatically retries once before moving to the next model in the chain. If all models are exhausted, it returns a friendly error message.

Rate Limiting

Configure per-model request rate limits to stay within provider quotas:

# Per-model RPM limits
NADIRCLAW_MODEL_RATE_LIMITS=gemini-3-flash-preview=30,gpt-4.1=60

# Default RPM for all models (0 = unlimited)
NADIRCLAW_DEFAULT_MODEL_RPM=0

When a model hits its RPM limit, NadirClaw automatically triggers the fallback chain rather than returning an error. Monitor rate limit status via the API:

curl http://localhost:8856/v1/rate-limits

Prompt Caching

NadirClaw includes an in-memory LRU cache for identical chat completions, skipping redundant LLM calls entirely.

# Configure cache
NADIRCLAW_CACHE_TTL=300          # TTL in seconds (default varies)
NADIRCLAW_CACHE_MAX_SIZE=1000    # Max cached entries

Monitor cache:

# CLI
nadirclaw cache

# API endpoint
curl http://localhost:8856/v1/cache

Cache is keyed on the full message content. Streaming requests with identical content will also hit the cache. Only exact matches count — no fuzzy matching.

Claude Code

NadirClaw works as a drop-in proxy for Claude Code:

# Point Claude Code at NadirClaw
export ANTHROPIC_BASE_URL=http://localhost:8856/v1
export ANTHROPIC_API_KEY=local

# Start NadirClaw, then use Claude Code normally
nadirclaw serve --verbose
claude

Or use a shell alias:

alias claude-routed='ANTHROPIC_BASE_URL=http://localhost:8856/v1 ANTHROPIC_API_KEY=local claude'

Simple prompts ("read this file", "what does this function do?") route to a cheap model like Gemini Flash. Complex prompts ("refactor this module") stay on Claude. Typical savings: 40–70%.

Using your Claude subscription

# OAuth login (opens browser)
nadirclaw auth anthropic login

# Or store token directly
nadirclaw auth setup-token

OpenClaw

# Auto-configure OpenClaw to use NadirClaw
nadirclaw openclaw onboard

# Start the router
nadirclaw serve

This writes NadirClaw as a provider in ~/.openclaw/openclaw.json with model nadirclaw/auto. OpenClaw auto-reloads — no restart needed.

The generated config:

{
  "models": {
    "providers": {
      "nadirclaw": {
        "baseUrl": "http://localhost:8856/v1",
        "apiKey": "local",
        "api": "openai-completions",
        "models": [{ "id": "auto", "name": "auto" }]
      }
    }
  },
  "agents": {
    "defaults": {
      "model": { "primary": "nadirclaw/auto" }
    }
  }
}

Codex

# Auto-configure Codex
nadirclaw codex onboard

# Start the router
nadirclaw serve

This writes ~/.codex/config.toml:

model_provider = "nadirclaw"

[model_providers.nadirclaw]
base_url = "http://localhost:8856/v1"
api_key = "local"

OpenAI OAuth

# Use ChatGPT subscription instead of API key
nadirclaw auth openai login

Continue

# Auto-configure Continue
nadirclaw continue onboard

This writes NadirClaw as a provider in ~/.continue/config.json. Continue reads the config on startup.

Cursor

# Show setup instructions
nadirclaw cursor onboard

In Cursor: Settings → Models → OpenAI API Key: local, Base URL: http://localhost:8856/v1. Select a model or use auto.

Open WebUI

# Show setup instructions
nadirclaw openwebui onboard

In Open WebUI: Admin Settings → Connections → OpenAI → Add Connection. Set URL to http://localhost:8856/v1 and API Key to local. Open WebUI auto-discovers NadirClaw's routing profiles and tier models via /v1/models.

Any OpenAI-Compatible Client

NadirClaw exposes a standard OpenAI-compatible API. Point any tool at it:

# Base URL: http://localhost:8856/v1
# Model: "auto" (or omit)
# API Key: "local" (or anything — auth disabled by default)

Python (openai SDK)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8856/v1",
    api_key="local",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(response.choices[0].message.content)

curl

curl http://localhost:8856/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "stream": true
  }'

Works with Continue, Aider, Cursor, Windsurf, or any tool that speaks the OpenAI chat completions API. Just set the base URL to http://localhost:8856/v1.

Custom Endpoints (vLLM, LocalAI, LM Studio)

NadirClaw can route to any OpenAI-compatible endpoint:

NADIRCLAW_API_BASE=http://your-server:8000/v1 \
NADIRCLAW_SIMPLE_MODEL=openai/your-small-model \
NADIRCLAW_COMPLEX_MODEL=openai/your-large-model \
nadirclaw serve --verbose

CLI Reference

nadirclaw serve

Start the router server.

nadirclaw serve [OPTIONS]

Options:
  --port INTEGER          Port (default: 8856)
  --simple-model TEXT     Model for simple prompts
  --complex-model TEXT    Model for complex prompts
  --token TEXT            Auth token
  --verbose               Debug logging
  --log-raw               Log full raw requests/responses to JSONL

nadirclaw setup

Interactive setup wizard — guides you through providers, API keys, and model selection.

nadirclaw setup

nadirclaw classify

Classify a prompt locally without running the server:

$ nadirclaw classify "What is 2+2?"
Tier:       simple
Confidence: 0.2848
Score:      0.0000
Model:      gemini-3-flash-preview

$ nadirclaw classify "Design a distributed system for real-time trading"
Tier:       complex
Confidence: 0.1843
Score:      1.0000
Model:      gemini-2.5-pro

nadirclaw report

Analyze request logs:

nadirclaw report                     # full report
nadirclaw report --since 24h         # last 24 hours
nadirclaw report --since 7d          # last 7 days
nadirclaw report --model gemini      # filter by model
nadirclaw report --format json       # machine-readable JSON
nadirclaw report --export report.txt # save to file

nadirclaw savings

Show cost savings with monthly projections:

nadirclaw savings
nadirclaw savings --since 7d

nadirclaw dashboard

Live terminal dashboard with real-time stats. Also available as web UI at http://localhost:8856/dashboard.

pip install nadirclaw[dashboard]
nadirclaw dashboard

nadirclaw status

Show current config, credentials, and server status:

$ nadirclaw status
NadirClaw Status
----------------------------------------
Simple model:  gemini-3-flash-preview
Complex model: gemini-2.5-pro
Port:          8856
Threshold:     0.06
Server:        RUNNING (ok)

nadirclaw test

Probe each configured model to verify credentials and connectivity. CI-friendly (exits 1 on failure):

nadirclaw test
nadirclaw test --simple-model gemini-2.5-flash --complex-model gpt-4.1

nadirclaw export

Export request logs for offline analysis:

nadirclaw export --format csv --since 7d
nadirclaw export --format jsonl

nadirclaw budget

Show real-time budget status and alerts:

nadirclaw budget

Other Commands

nadirclaw auth add/status/remove    # Manage credentials
nadirclaw auth openai login         # OAuth login (ChatGPT subscription)
nadirclaw auth anthropic login      # OAuth login (Claude subscription)
nadirclaw auth gemini login         # OAuth login (Google Gemini)
nadirclaw auth setup-token          # Store Claude subscription token
nadirclaw codex onboard             # Configure Codex integration
nadirclaw openclaw onboard          # Configure OpenClaw integration
nadirclaw continue onboard          # Configure Continue integration
nadirclaw cursor onboard            # Show Cursor setup instructions
nadirclaw openwebui onboard         # Show Open WebUI setup
nadirclaw ollama discover           # Auto-discover Ollama instances
nadirclaw ollama discover --scan-network  # Network-wide scan
nadirclaw cache                     # View cache stats
nadirclaw build-centroids           # Regenerate centroid vectors

Budget & Cost Tracking

NadirClaw tracks per-request costs in real time and supports budget limits with alerts.

Setting Budgets

# In ~/.nadirclaw/.env
NADIRCLAW_DAILY_BUDGET=10.00
NADIRCLAW_MONTHLY_BUDGET=200.00
NADIRCLAW_BUDGET_WARN_THRESHOLD=0.8  # Alert at 80% of budget

Alerts

When spend crosses the warning threshold, NadirClaw can:

Webhook: POST a JSON payload to NADIRCLAW_BUDGET_WEBHOOK_URL
Stdout: Print alerts if NADIRCLAW_BUDGET_STDOUT_ALERTS=true

Reporting

# See savings
nadirclaw savings

# Detailed report with cost breakdown
nadirclaw report --since 7d

# Live monitoring
nadirclaw dashboard

Reports include: total requests, tier distribution, per-model usage and tokens, latency percentiles (p50/p95), fallback counts, and error rates.

Prometheus Metrics

NadirClaw exposes a /metrics endpoint in Prometheus format with zero extra dependencies.

curl http://localhost:8856/metrics

Metric	Type	Labels	Description
`nadirclaw_requests_total`	counter	model, tier, status	Total completed LLM requests
`nadirclaw_tokens_prompt_total`	counter	model	Total prompt tokens
`nadirclaw_tokens_completion_total`	counter	model	Total completion tokens
`nadirclaw_cost_dollars_total`	counter	model	Estimated cost in USD
`nadirclaw_request_latency_ms`	histogram	model, tier	Request latency distribution
`nadirclaw_cache_hits_total`	counter	—	Prompt cache hits
`nadirclaw_fallbacks_total`	counter	from_model, to_model	Fallback events
`nadirclaw_errors_total`	counter	model, error_type	Request errors
`nadirclaw_uptime_seconds`	gauge	—	Seconds since start

Add NadirClaw as a Prometheus scrape target:

# prometheus.yml
scrape_configs:
  - job_name: nadirclaw
    static_configs:
      - targets: ['localhost:8856']

OpenTelemetry Tracing

Optional distributed tracing with GenAI semantic conventions. Install the telemetry extra:

pip install nadirclaw[telemetry]

# Point to your collector
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 nadirclaw serve

Emitted spans:

smart_route_analysis — classifier decision (tier, confidence, model selected)
dispatch_model — LLM provider call (model, tokens, latency)
chat_completion — full request lifecycle

Includes GenAI semantic conventions plus custom nadirclaw.* attributes for routing metadata.

Docker

NadirClaw + Ollama (fully local, zero cost)

git clone https://github.com/doramirdor/NadirClaw.git && cd NadirClaw
docker compose up

This starts Ollama and NadirClaw on port 8856. Pull a model:

docker compose exec ollama ollama pull llama3.1:8b

With cloud providers

Create a .env file with API keys and model config (see .env.example), then restart:

# .env
GEMINI_API_KEY=AIza...
NADIRCLAW_SIMPLE_MODEL=gemini-3-flash-preview
NADIRCLAW_COMPLEX_MODEL=gemini-2.5-pro

Standalone (no Ollama)

docker build -t nadirclaw .
docker run -p 8856:8856 --env-file .env nadirclaw

API Reference

Auth is disabled by default (local-only). Set NADIRCLAW_AUTH_TOKEN to require a bearer token.

Endpoint	Method	Description
`/v1/chat/completions`	POST	OpenAI-compatible completions with auto routing (supports `stream: true`)
`/v1/classify`	POST	Classify a prompt without calling an LLM
`/v1/classify/batch`	POST	Classify multiple prompts at once
`/v1/models`	GET	List available models
`/v1/logs`	GET	View recent request logs
`/v1/cache`	GET	Cache stats
`/v1/rate-limits`	GET	Per-model rate limit status
`/v1/budget`	GET	Budget status and alerts
`/metrics`	GET	Prometheus metrics
`/health`	GET	Health check (no auth)
`/dashboard`	GET	Web dashboard UI

Chat Completions Request

POST /v1/chat/completions

{
  "model": "auto",           // or "eco", "premium", "free", "reasoning", or a model alias
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "What is 2+2?"}
  ],
  "stream": true,            // optional, SSE streaming
  "temperature": 0.7,        // optional, passed through to provider
  "tools": [...]             // optional, triggers agentic detection
}

Classify Request

POST /v1/classify

{
  "messages": [
    {"role": "user", "content": "What is 2+2?"}
  ]
}

// Response:
{
  "tier": "simple",
  "confidence": 0.2848,
  "score": 0.0,
  "model": "gemini-3-flash-preview"
}

Troubleshooting

First request is slow (2–3 seconds)

Normal — NadirClaw downloads the sentence embedding model (~80 MB) on first use. Subsequent requests classify in ~10ms.

Port 8856 already in use

# Use a different port
nadirclaw serve --port 9000

# Or set in env
NADIRCLAW_PORT=9000

Model returning errors

Check nadirclaw auth status to verify credentials
Run with --verbose for detailed error messages
Ensure your API key has access to the configured models

Ollama not found

# Auto-discover Ollama instances
nadirclaw ollama discover

# Or set manually
OLLAMA_API_BASE=http://192.168.1.100:11434 nadirclaw serve

Too many prompts routed to complex

Raise the confidence threshold to route more to simple:

NADIRCLAW_CONFIDENCE_THRESHOLD=0.10

Too many prompts routed to simple (quality issues)

Lower the confidence threshold:

NADIRCLAW_CONFIDENCE_THRESHOLD=0.03

Streaming not working

NadirClaw supports full SSE streaming. Ensure your request includes "stream": true and your client handles SSE format. Check that you're not behind a reverse proxy that buffers responses.

Rate limits (429 errors)

NadirClaw handles these automatically — retries once, then falls through the fallback chain. If all models are exhausted, configure additional fallbacks:

NADIRCLAW_FALLBACK_CHAIN=gpt-4.1,claude-sonnet-4-5-20250929,gemini-2.5-flash

Viewing logs

# Request logs
ls ~/.nadirclaw/logs/

# Full raw logging (for debugging)
nadirclaw serve --log-raw

# Analyze logs
nadirclaw report --since 24h

Need help? Open an issue on GitHub or check the README for the latest info.