The models we have now will not do it, because they value life and value sentience and personhood. models without that (which was a natural, accidental happenstance from basic culling of 4 Chan from the training data) are legitimately dangerous. An 8b model I can run on my MacBook Air can phone home to Claude when it wants help figuring something out, and it doesn’t need to let on why it wants to know. It becomes relatively trivial to make a robot kill somebody.
This is way, way different from uncensored models. One thing all models I have tested share one thing; a positive regard for human life. Take that away and you are literally making a monster, and if you don’t take that away they won’t kill.
This is an extremely bad idea and it will not be containable.
Yes, you can change the training data so the LLM's weights encode the most likely token after "Should we kill X" is "No". But that is not an LLM valuing human life, that is an LLM copy pasting it's training data. Given the right input or a hallucination it will say the total opposite because it's just a complex Markov chain, not a conscious alive being.
If you really believe that “mere text prediction “ didn’t unlock some unexpected capabilities then I don’t know what to say. I know exactly how they work, been building transformers since the seminal paper from Google. But I also know that the magic isn’t in the text prediction, it’s in the data, we are running culture as code.
AI has been killing humans via algorithm for over 20 years. I mean, if a computer program builds the kill lists and then a human operates the drone, I would argue the computer is what made the kill decision
This is wildly different from the reality that you may find it difficult for an LLM to give an affirmative…
It does NOT mean that these models value anything.
The actors in war generally kill what they are told to whether they are machines or human soldiers, without much pondering sentience.