Anthropic reverses policy that secretly degraded Claude for AI researchers

Anthropic reversed a policy that secretly degraded Claude Fable 5's performance for AI researchers, making the restrictions visible after backlash. The company apologized, saying it "made the wrong trade-off."

Categorized in: AI News IT and Development
Published on: Jun 11, 2026
Anthropic reverses policy that secretly degraded Claude for AI researchers

Anthropic Reverses Hidden Safeguard Policy on Claude Fable 5

Anthropic is backing away from a policy that would have secretly degraded Claude Fable 5's performance for researchers attempting to build competing AI models. The company reversed course after significant pushback from the AI research community.

"We're changing Fable 5's safeguards for frontier LLM development to make them visible," Anthropic said in a statement. "We made the wrong trade-off and we apologize for not getting the balance right."

What the original policy did

When Anthropic released Claude Fable 5 earlier this week, it included visible safeguards for certain domains. Users asking about cybersecurity, biology, or chemistry would be routed to a less capable model to reduce misuse risks.

But for AI researchers, Anthropic took a different approach: it would deliberately degrade the model's performance in ways invisible to the user. This would effectively block researchers from using Claude to train competing AI models, which Anthropic's terms of service already prohibit.

The catch was that users wouldn't know when safeguards were triggered. Developers couldn't tell whether they were violating Anthropic's rules or hitting a deliberately crippled response.

Why researchers objected

Dean Ball, a senior fellow at the Foundation for American Innovation and former White House AI adviser, called the approach "shockingly hostile." He noted that secret performance degradation undermines Anthropic's stated commitment to AI safety, since it prevents other researchers from collaborating on safety work.

Will Brown, research lead at open-source AI startup Prime Intellect, said the policy would have created a troubling precedent. "It felt like Anthropic was saying to the public, 'We don't trust anybody else to do AI research. We are the only ones who have to do AI research,'" he said.

Brown also flagged practical consequences. Third-party evaluation firms that test frontier models for safety and reliability could have been blocked without knowing it. Researchers developing open-source AI projects would have faced invisible barriers.

Anthropic's reasoning

Anthropic said it implemented the safeguards because Claude has become effective at accelerating AI research. The company wants to "slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up."

Anthropic also cited national security concerns. It said the safeguards prevent adversaries from using Claude to optimize foreign chips, which could erode the US advantage in frontier AI hardware and software.

The new approach

Under the revised policy, Anthropic will now alert users when it refuses a request or routes them to a less capable model. This transparency comes with a tradeoff: visible safeguards are easier to probe and work around, so Anthropic will need to cast a wider net to maintain effectiveness.

The company said it's working to make its classifiers more precise to minimize false positives on benign requests.

Claude's coding capabilities have made it a standard tool for developers, including those working on open-source generative AI and LLM projects. The reversal means those developers will now have clarity about which requests trigger safeguards rather than encountering unexplained performance drops.


Get Daily AI News

Your membership also unlocks:

700+ AI Courses
700+ Certifications
Personalized AI Learning Plan
6500+ AI Tools (no Ads)
Daily AI News by job industry (no Ads)