Context Optimize

Routes to the right model, then trims the payload before it hits your bill. Two modes: lossless safe, or aggressive with diff-preserving semantic dedup. Off by default.

61.5% avg reduction (safe) +18% more with aggressive 60 accuracy tests 0ms overhead when off

Before and after

Same data. Same meaning. Fewer tokens.

Before — raw agent context

// Tool schema sent every turn (turn 4 of 8)
{
  "name": "web_search",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": "Search query"
      },
      "num_results": {
        "type": "integer",
        "default": 5
      },
      "site_filter": {
        "type": "string",
        "description": "Restrict to domain"
      }
    },
    "required": [
      "query"
    ]
  }
}

~120 tokens (repeated 8x = ~960 wasted tokens)

After — safe mode

// First turn: full schema preserved
{"name":"web_search","parameters":{"type":"object",
"properties":{"query":{"type":"string","description":
"Search query"},"num_results":{"type":"integer",
"default":5},"site_filter":{"type":"string",
"description":"Restrict to domain"}},"required":
["query"]}}

// Turns 2-8: replaced with reference
[see tool "web_search" schema above]

~45 tokens first turn + ~8 tokens per repeat = 92% saved on schema tokens

The LLM sees identical semantic content. The tool name, all parameters, types, and descriptions are preserved in the first occurrence. Subsequent turns reference it.

Benchmarked on Claude Opus 4.6

Real payloads. Measured savings. $15/1M input tokens.

Payload	Before	After	Saved	%	Saved / 1K req
Agentic coding assistant tool_schema_dedup, json_minify, whitespace_normalize	3,657	1,573	2,084	57%	$31.26
RAG pipeline (6 chunks) json_minify	544	386	158	29%	$2.37
API response analysis (nested JSON) json_minify	1,634	616	1,018	62%	$15.27
Long debug session (50 turns) json_minify, chat_history_trim	3,856	1,414	2,442	63%	$36.63
OpenAPI spec context (5 endpoints) json_minify	2,649	762	1,887	71%	$28.30

Input tokens only. Output tokens are unaffected. Savings scale linearly with request volume.

Accuracy is not harmed

Every safe-mode transform is deterministic and lossless. The LLM receives identical semantic content.

✓ JSON roundtrips exactly

Pretty-printed JSON is parsed and re-serialized compact. json.loads(before) == json.loads(after) always holds. No keys dropped, no values changed, no type coercion.

✓ Code blocks untouched

Content inside fenced code blocks (```) is never modified. Indentation, whitespace, and formatting inside code are preserved exactly as written.

✓ Tool schemas preserved

Deduplication keeps the full schema on first occurrence. Later turns get a named reference. The LLM sees the complete schema definition once and knows to look up.

✓ Recent context kept

Chat history trimming preserves the system prompt, the first turn (for task context), and the last N turns. A placeholder notes how many turns were trimmed.

✓ Unicode and emoji safe

All transforms use ensure_ascii=False. CJK characters, emoji, RTL text, and special Unicode are preserved byte-for-byte through the optimization pipeline.

✓ URLs never altered

URLs, file paths, and query strings pass through unchanged. Whitespace normalization only affects runs of spaces outside code blocks, never inside structured strings.

✓ Refinements survive dedup

Aggressive mode uses word-level diffing (difflib.SequenceMatcher) to extract unique phrases before compacting. "Return values, not indices" is never lost even when the rest of the message is near-identical.

✓ No negative savings

Every transform checks that its output is smaller than its input. If a replacement would be larger (e.g., diff overhead exceeds savings), the original message is kept untouched.

✓ Accurate token counting

Token estimates use tiktoken (cl100k_base BPE encoding, same as GPT-4/Claude). Falls back to len//4 heuristic only if tiktoken is unavailable.

Why lossless matters: Lossy compression (semantic summarization) risks changing meaning. An LLM might answer differently if a nuance is lost. Safe mode avoids this entirely — it only removes formatting redundancy that carries zero semantic weight. Aggressive mode goes further with semantic dedup, but preserves every unique phrase via word-level diff extraction. Both are backed by 60 automated tests covering accuracy, edge cases, and roundtrip integrity.

Projected monthly savings

Claude Opus 4.6 at $15/1M input tokens. Based on average 61.5% reduction.

100 req/day

$68

per month

500 req/day

$342

per month

1K req/day

$683

per month

5K req/day

$3,415

per month

These are input-token savings only. Combined with routing (40-70% via cheaper models), total cost reduction is higher.

Six transforms across two modes

Five deterministic transforms in safe mode. One embedding-based transform added in aggressive.

JSON minification safe

Finds JSON objects and arrays in message content using raw_decode. Re-serializes with no whitespace. Skips content inside fenced code blocks. Only replaces when the compact form is shorter.

Tool schema deduplication safe

Detects identical tool/function schemas across messages (by canonical JSON comparison). Keeps the first occurrence, replaces subsequent copies with [see tool "name" schema above].

System prompt deduplication safe

If the system prompt text appears verbatim in a later user message (common in some frameworks), the duplicate is removed from the later message. System message itself is never modified.

Whitespace normalization safe

Collapses 3+ consecutive blank lines to 2. Collapses runs of multiple spaces to one. Skips lines inside fenced code blocks to preserve formatting where it matters.

Chat history trimming safe

For conversations exceeding max_turns (default: 40), keeps the system prompt, the first user/assistant turn, and the last N turns. Inserts a placeholder noting how many turns were trimmed. Configurable via NADIRCLAW_OPTIMIZE_MAX_TURNS.

Semantic deduplication aggressive

Uses all-MiniLM-L6-v2 sentence embeddings to find near-duplicate messages (cosine similarity ≥ 0.85). Replaces the later message with a compact reference plus any unique differences extracted via word-level diffing. System messages and short messages (<60 chars) are never touched. Only fires when the replacement is actually smaller than the original.

Aggressive mode preserves what matters

Near-duplicate messages get compacted, but unique details are extracted and kept.

Before — user refines their request

// Turn 1
Write a Python function that takes a list of
integers and returns the two numbers that add
up to a target sum. Use a hash map for O(n)
time complexity. Handle edge cases like empty
lists and duplicates. Return the indices
of the two numbers.

// Turn 3 (near-duplicate, different ending)
Write a Python function that takes a list of
integers and returns the two numbers that add
up to a target sum. Use a hash map for O(n)
time complexity. Handle edge cases like empty
lists and duplicates. Return the actual
values, not indices.

Turn 3 repeats ~90% of turn 1 — wasted tokens

After — aggressive mode

// Turn 1 — preserved in full
Write a Python function that takes a list of
integers and returns the two numbers that add
up to a target sum. Use a hash map for O(n)
time complexity. Handle edge cases like empty
lists and duplicates. Return the indices
of the two numbers.

// Turn 3 — compacted with diff preserved
[similar to earlier message: "Write a Python
function that takes a list of integers and re..."]
Key differences: actual values, not indices.

Critical refinement kept — boilerplate removed — tokens saved

The diff is extracted using difflib.SequenceMatcher at the word level. Only inserted or replaced words are kept. If the replacement would be larger than the original, the dedup is skipped entirely.

Enable in one flag

Off by default. Zero overhead when disabled.

Enable context optimize

# On the server
nadirclaw serve --optimize safe

# Or via environment variable
NADIRCLAW_OPTIMIZE=safe nadirclaw serve

# Per-request override (in the JSON body)
{"model": "auto", "optimize": "safe", "messages": [...]}

# Dry-run on a file (no server needed)
nadirclaw optimize payload.json --mode safe --format json

off

Default. No processing. Zero overhead.

safe

Lossless transforms only. Recommended.

aggressive

Safe + semantic dedup via embeddings.

Route smarter. Send less. Pay less.

NadirClaw picks the right model and trims the payload before it hits your bill.

Get Started Back to Home