Anthropic identifies three operational changes that degraded Claude's performance
Anthropic published a technical post-mortem on April 23 identifying three separate changes to Claude's system configuration that caused the model to perform worse over the past six weeks. The company said it has reverted or fixed all three issues.
Developers and power users had reported a measurable shift in Claude's behavior since early April. Users across GitHub, X, and Reddit described the phenomenon as "AI shrinkflation" - claiming the model showed weaker reasoning depth, generated more hallucinations, and wasted tokens on verbose outputs that didn't match the quality they'd come to expect.
Third-party benchmarks appeared to validate the complaints. BridgeMind reported Claude Opus 4.6's accuracy dropped from 83.3% to 68.3%, causing its ranking to plummet from No. 2 to No. 10. A detailed audit by Stella Laurenzo at AMD's AI group analyzed 6,852 Claude Code sessions and found the model increasingly chose simplistic fixes over correct ones.
Anthropic initially pushed back against claims the company had intentionally "nerfed" the model to manage demand. The post-mortem revealed the actual causes were operational, not related to the underlying model weights.
What went wrong
Default reasoning effort change: On March 4, Anthropic lowered the default reasoning effort from high to medium for Claude Code to reduce UI latency. The interface no longer appeared "frozen" while the model thought, but complex tasks suffered noticeably.
Caching logic bug: A March 26 optimization meant to clear old "thinking" from idle sessions contained a critical flaw. Instead of pruning history once after an hour of inactivity, the system cleared it on every turn, causing the model to lose short-term memory and repeat itself.
System prompt constraints: On April 16, Anthropic added instructions limiting text between tool calls to 25 words and final responses to 100 words. The verbosity restrictions caused a 3% drop in coding quality evaluations.
The issues affected Claude Code, the Claude Agent SDK, and Claude Cowork. The Claude API remained unaffected.
Operational changes to prevent recurrence
Anthropic is implementing four operational safeguards for operations teams managing Claude deployments:
- Internal dogfooding: More staff will use the exact public builds of Claude Code to catch issues before users experience them.
- Enhanced evaluation suites: The company will run broader per-model evaluations and ablation tests for every system prompt change to isolate the impact of specific instructions.
- Tighter controls: New tooling makes prompt changes easier to audit, and model-specific changes are now strictly gated to their intended targets.
- Subscriber compensation: Anthropic reset usage limits for all subscribers on April 23 to account for wasted tokens and performance degradation.
Anthropic said it will use a dedicated @ClaudeDevs account on X and GitHub threads to explain the reasoning behind future product decisions and maintain more transparent dialogue with developers.
For operations professionals managing Claude in production, the key takeaway is straightforward: configuration and system prompt changes can degrade model output as significantly as model weights. Monitoring for behavioral shifts and maintaining clear audit trails for operational changes is essential. The incident underscores why AI for Operations requires the same rigor as traditional infrastructure management.
Your membership also unlocks: