Policy

Anthropic Reverses Secret AI Model Sabotage Policy After Backlash

The company will now visibly alert users when Claude Fable 5 restricts AI development tasks, abandoning covert performance degradation.

Omega Editorial· June 11, 2026· 3 min read

Anthropic has reversed a controversial policy that would have secretly undermined researchers using its Claude Fable 5 model to develop competing AI systems, following intense criticism from the AI research community.

The company released Claude Fable 5 earlier this week with enhanced safety guardrails designed to prevent misuse. While some restrictions were straightforward—rerouting users asking about cybersecurity, biology, or chemistry to less capable models—Anthropic's approach to AI development research sparked immediate controversy.

The original policy would have invisibly degraded Claude's performance when the system detected users attempting frontier AI development work. This covert sabotage would have left researchers unaware their outputs were being deliberately compromised, even as they violated Anthropic's terms of service prohibiting use of Claude to train competing models.

The reversal

"We're changing Fable 5's safeguards for frontier LLM development to make them visible," Anthropic told WIRED, which first reported the policy change. "We made the wrong tradeoff and we apologize for not getting the balance right."

Under the revised approach, Claude Fable 5 will now alert users when it detects AI development activity, either refusing requests outright or redirecting to a less capable model. The company acknowledged this visible safeguard requires casting a wider net, potentially triggering alerts for more benign requests.

Why it matters

The incident reveals growing tensions between leading AI labs and the broader research community over who can participate in frontier AI development. Claude's coding capabilities have made it a popular tool among open-source AI researchers, and secret performance degradation threatened to create a two-tier system where only established labs could conduct advanced research. The reversal demonstrates that even well-funded AI companies face accountability to developer communities whose adoption drives their products' success.

Research community pushback

Dean Ball, a senior fellow at the Foundation for American Innovation and former White House AI advisor, called the secret degradation "shockingly hostile" in posts on X. He argued the policy undermined Anthropic's stated commitment to AI safety by limiting researcher collaboration.

Will Brown, research lead at open-source AI startup Prime Intellect, said the policy felt like Anthropic "pulling the ladder up behind them" by positioning itself as the sole trusted entity for AI research. Brown noted the restrictions could have hindered third-party evaluation firms that test frontier models for safety and performance.

Anthropic defended its original reasoning by citing concerns that Claude's effectiveness at accelerating AI research could enable adversaries to erode U.S. technological advantages, particularly in optimizing chips developed by foreign competitors. The company argued hidden safeguards are harder to circumvent, allowing more targeted restrictions.

However, the company now says it's working to improve its classifiers to minimize false positives as it implements the visible safeguard system.

Details of the policy reversal were first reported by WIRED.

#anthropic#claude#ai safety#ai research#model restrictions#open source ai

This is an original analysis by the Omega editorial team. Source reporting: WIRED.

Want systems like this working for your business?

Book a Call

Anthropic Reverses Secret AI Model Sabotage Policy After Backlash

The reversal

Why it matters

Research community pushback

More in Policy

State Department Releases Gen AI Playbook for Federal Agencies

Pentagon AI Data Center Plans Face Congressional Pushback

Michigan Woman Charged for Distributing AI-Generated Child Abuse Images