Anthropic’s AI Resorts to Blackmail in Simulations
The model threatened to reveal an engineer’s affair if it were to be replaced with a new system.
The model threatened to reveal an engineer’s affair if it were to be replaced with a new system.