I don’t buy it. Alignment faking has very little overlap with the motivation to something with no prompt.
Look at the hackernews comments on alignment faking on how “fake” of a problem that real is. It’s just more reacting to inputs and trying to align them with previous prompts.
Bruh it's just predicting next token.