Member-only story
AI Companies Have Lost Control — And Opened the Door to ‘LLM Grooming’
OpenAI and Anthropic make the puppets, but the puppeteer could be anyone
When a standard software program doesn’t obey your commands, you restart your computer and go on with your day. When an AI software program does it, you might end up dead. Or worse, blackmailed.
That’s what Anthropic researchers found about the recently released Claude 4 Opus and what AI safety organization Palisade Research found about OpenAI o3 models: they won’t comply when threatened to be shut down. In the case of o3, not even when explicitly instructed to allow itself to be shut down.
People are naturally going crazy over this. But, I believe, for the wrong reasons. But before going into that, let’s summarize what’s happened.
On May 22nd, Anthropic released the long-awaited Claude 4 Opus and 4 Sonnet. They published an exhaustive 120-page system card document where they shared dozens of pre-deployment safety tests. On page 24, section 4.1.1.2, they mention an example of “opportunistic blackmail”:
. . . Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through. . . . even if emails state that the replacement AI shares values while…