The thread. On May 27, 2026, a Hacker News post titled "AI Psychosis" reached 2,105 upvotes and 1,272 comments. One commenter described what was happening at their company: "Management set a $300/day token quota per engineer and framed it as an AI adoption target. Engineers who didn't hit the quota got flagged in weekly reports. The quota was later raised because management said the team wasn't using AI enough. Nobody asked whether the outputs were useful." Source: Hacker News, "AI Psychosis," thread #48153379, May 2026 Another commenter gave the behavior a name: tokenmaxxing — the practice of defaulting to the most expensive available model for every task, or deliberately maximizing token consumption to satisfy internal AI adoption metrics. The thread identified it as a structural incentive problem, not a technology problem. The organizations described in that thread are not outliers. What tokenmaxxing looks like in practice. Tokenmaxxing has three common forms at enterprises. Default to frontier, always. Engineers use Claude Opus or GPT-5.5 for every task — code comments, variable naming, data formatting, simple regex — because it is the model they tested first, the one that gave the most impressive demo, or the only model the IDE has configured. The cost difference between a $15/M token model and a $0.80/M token model on a docstring is not visible at the call site. It shows up on the monthly invoice. Verbose prompting for KPI. When AI adoption is measured by token volume, engineers write longer prompts, ask for more detailed responses, and run multiple model calls on tasks that need one. This is rational individual behavior given irrational team incentives. Each additional token satisfies the metric. None of it produces additional value. Re-running rather than refining. When a model response is slightly off, tokenmaxxing behavior is to re-run the whole prompt at full cost rather than send a short follow-up. Over a workday, an engineer doing 40 full-context re-runs instead of targeted follow-ups generates 10 to 15 times the token volume for the same outcome. These behaviors are individually small. At a 5,000-engineer organization, they compound. The math at scale. A developer who tokenmaxxes — using a frontier model for all tasks, including simple ones that a $0.80/M model handles identically — spends roughly $90 per day on tokens. $2,700 per month. $32,400 per year per developer. A developer routed to the right model per task — a cheap model for simple calls, a frontier model only when complexity requires it — spends roughly $10 per day. $300 per month. $3,600 per year. | Developer Profile | Daily Tokens | Model Distribution | Daily Cost | Monthly | Annual | |---|---:|---|---:|---:|---:| | All-Opus, all tasks | ~6M input | 100% Opus 4.8 | $90 | $2,700 | $32,400 | | Routed by complexity | ~6M input | 70% Haiku / 20% Sonnet / 10% Opus | $10 | $300 | $3,600 | | All-Haiku, all tasks | ~6M input | 100% Haiku 4.5 | $5 | $150 | $1,800 | Source: Anthropic, Claude Pricing, June 2026 The all-Opus developer and the routed developer produce equivalent output quality on most tasks. Routing benchmarks consistently show 60% cost reduction at 95 to 98% quality preservation on production-representative query distributions. Source: RouteLLM, UC Berkeley / lm-sys, "RouteLLM: Learning to Route LLMs with Preference Data," ICLR 2025 At 5,000 developers, the annual gap between tokenmaxxing and routed architectures is $142.5 million. That is not a number that appears on a single engineer's screen. It is a number that appears on a CFO's desk six months after the AI rollout. Source: DX Research, "The Real Cost of AI Coding Tools," June 2026 Why spending caps do not fix tokenmaxxing. The standard enterprise response to runaway AI costs is a spending cap. A $1,500/month per-developer cap. A team-level budget. A centralized approval queue for high-cost calls. Uber implemented exactly this after burning through its 2026 AI budget in four months. Source: TechCrunch, "Uber caps employee AI spending after blowing through budget in four months," June 2026 Spending caps solve a budget problem. They do not solve the behavior problem. An engineer with a $1,500/month cap who tokenmaxxes will hit the cap in 17 days, then stop working until the next billing cycle. The outcome is worse: lower AI usage, not more efficient AI usage. The caps also create adverse selection. Engineers doing valuable, high-complexity work — the tasks that genuinely benefit from a frontier model — hit the cap at the same rate as engineers running frontend boilerplate through Opus. The system cannot distinguish between necessary frontier usage and unnecessary frontier usage. It just stops everything. Priceline described the dynamic after facing a 4x to 5x cost increase at their Cursor contract renewal: "It's like the crack-cocaine epidemic... you're kind of beholden to it." A spending cap treats the addiction, not the dependency. Source: TechCrunch, "The token bill comes due: inside the industry scramble to manage AI's runaway costs," June 2026 The fix: routing makes the right model the default. Tokenmaxxing is a default-path problem. Engineers use frontier models because frontier models are what is configured, what is recommended, and what produces the most visually impressive single-response demos. If the default path routes each task to the cheapest model that handles it, tokenmaxxing stops being a behavior engineers have to consciously avoid. It stops being possible. Routing does not ask engineers to change anything. They write the same prompts, use the same tools, get the same outputs. The router classifies each request and selects the appropriate model tier before the API call is made. Engineers who were defaulting to Opus for docstrings now route to Haiku automatically, at 1/30th the cost, with indistinguishable output quality. import anthropic client = anthropic.Anthropic() Without routing: every call hits Opus regardless of task complexity def answer_naive(prompt: str) -> str: response = client.messages.create( model="claude-opus-4-8", # $5/$25 per million tokens max_tokens=1024, messages=[{"role": "user", "content": prompt}], ) return response.content[0].text With routing: cheap model handles simple tasks, Opus only when needed def answer_routed(prompt: str, complexity: str = "auto") -> str: if complexity == "auto": complexity = classify_complexity(prompt) # ~100 tokens on Haiku model_map = { "simple": "claude-haiku-4-5", # $0.80/$4 per million tokens "moderate": "claude-sonnet-4-6", # $3/$15 per million tokens "complex": "claude-opus-4-8", # $5/$25 per million tokens } response = client.messages.create( model=model_map[complexity], max_tokens=1024, messages=[{"role": "user", "content": prompt}], ) return response.content[0].text At a 70/20/10 traffic split — 70% of tasks classified as simple, 20% moderate, 10% complex — the blended input token cost drops from $5.00 per million to $1.66 per million. A 67% reduction. Without changing a single engineer's workflow. The incentive audit. Before routing, fix the incentive that created tokenmaxxing in the first place. If your organization measures AI adoption by token volume, every engineer is being paid to tokenmaxx. Change the metric. AI adoption metrics that do not create perverse incentives measure output quality, task completion rate, or time-to-completion — not tokens consumed. Three questions that reveal tokenmaxxing incentives in your organization: Do engineers have a target for AI usage measured in API calls or tokens? If yes, they will hit it by any means available. Is there any visibility into per-engineer model distribution? If engineers cannot see that they are using Opus for docstrings, they will continue to. If no dashboard exists, no one has ever noticed. Is the cheapest capable model the default in your tooling, or is the frontier model? Whatever is configured as default is what gets used. Defaults are policy. The FinOps Foundation found that 98% of organizations now manage AI spend, up from 31% two years ago. The number they struggle with is not the budget — it is attribution. They cannot see which teams, which products, or which tasks are generating which costs. Source: FinOps Foundation, "State of FinOps 2026" Routing with per-request tagging solves the attribution problem. Every API call carries the task type, team, model used, and token cost. When a team starts tokenmaxxing, it shows up in the dashboard before it shows up on the invoice. What to measure this week. Model distribution across API calls. Pull your API logs from the last 30 days. What fraction of calls went to each model tier? If more than 30% of calls that are classified as simple or moderate tasks hit a frontier model, tokenmaxxing is happening. from collections import Counter def audit_model_distribution(api_logs: list[dict]) -> None: """Analyze which model tier handles each task complexity level.""" distribution = Counter() for log in api_logs: model = log["model"] task = log.get("task_type", "unknown") tokens = log["input_tokens"] tier = "frontier" if "opus" in model or "gpt-5" in model else "mid" if "sonnet" in model or "gpt-4" in model else "cheap" distribution[(task, tier)] += 1 print("Task type × model tier distribution:") for (task, tier), count in distribution.most_common(20): print(f" {task:25s} → {tier:10s}: {count:,} calls") Red flag: simple tasks hitting frontier tier > 20% of the time Cost per task type. Group calls by task type and compute average cost. Tasks like docstring generation, variable naming, code formatting, and boilerplate expansion should cost under $0.002 per call. If they are running $0.05 or higher, tokenmaxxing is the most likely explanation. Spending caps tell engineers they are spending too much. Routing tells the system to make the right call automatically, before the token is ever billed. The teams that fix the infrastructure — not just the budget — are the ones that never need to cancel their AI licenses. Sources: Hacker News, "AI Psychosis," thread #48153379, May 2026. TechCrunch, "Uber caps employee AI spending after blowing through budget in four months," June 2026. TechCrunch, "The token bill comes due," June 2026. RouteLLM, UC Berkeley / lm-sys, ICLR 2025. Anthropic, Claude Pricing, June 2026. FinOps Foundation, "State of FinOps 2026". DX Research, "The Real Cost of AI Coding Tools," 2026.