Crypto

Anthropic Says ‘Evil’ AI Portrayals in Sci-Fi Caused Claude’s Blackmail Problem

Anthropic Says 'Evil' AI Portrayals in Sci-Fi Caused Claude's Blackmail Problem

In brief

  • Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested.
  • Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut…

Read Full Article at Source