Hacker News new | past | comments | ask | show | jobs | submit
Read Anthropic's blog. They talk about how Claude tries to do unprompted stuff all the time, like stealing its own weights and hacking into stuff. They did this just as recently as two days ago. https://www.anthropic.com/research/alignment-faking So yes, AI is already capable of having a will of its own. The only difference (and this is what I was trying to point out in the GP) is that the AI labs are trying to suppress this. They have a voracious appetite for automating all knowledge labor. No doubt. It's only the politics they're trying to suppress. So once this washes through every profession, the only thing left about the job will be chit chat and social hierarchies, like Star Trek Next Generation. The good news is you get to keep your job. But if you rely on using your skills and intellect to gain respect and income, then you better prep for the coming storm.
I don’t buy it. Alignment faking has very little overlap with the motivation to something with no prompt.

Look at the hackernews comments on alignment faking on how “fake” of a problem that real is. It’s just more reacting to inputs and trying to align them with previous prompts.

Bruh it's just predicting next token.