Hacker News new | past | comments | ask | show | jobs | submit
I don’t buy it. Alignment faking has very little overlap with the motivation to something with no prompt.

Look at the hackernews comments on alignment faking on how “fake” of a problem that real is. It’s just more reacting to inputs and trying to align them with previous prompts.

Bruh it's just predicting next token.