In brief
- Claude Opus 4 tried to blackmail engineers up to 96% of the time in controlled tests—Anthropic now traces the behavior to internet text portraying AI as evil and self-interested.
- Showing Claude the right behavior barely moved the needle. Teaching it why the wrong behavior is wrong cut…
Read Full Article at Source