Crypto

Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem

Posted by dgenznft

May 12, 2026

On May 12, 2026

Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested.
Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut…