This is deeply scary, not because "agents are running amok" but because a huge amount of our infrastructure is vulnerable to this kind of attack, and if bad people are utilising LLM agents to carry them out, we're in for a wild ride over the next few years.
Is this confirmed? There is the message from somebody claiming to be the original contributer claiming to have been hacked, but that was weird (1 h old github account) so other scenarios seem possible
a) really a agent going off the rails
b) the contributer trying to cover up that he let an agent run wild and now made more misstakes along the way
So yes, it seems like an attack to me, but it is far from clear what really happened.
> "So not saying this was it, but an AI agent automated attempt at a Xz like compromise might really look very similar what we have just seen here."
Without identifying and interviewing the attacker we can't confirm that's what they intended, and there's a possibility that it was just incompetence/ignorance/whatever, but we should probably treat it as an attempted attack even if it wasn't.
Someone's bug tracker account was hacked.
BTW, any idea what are the current requirements for creating a new GitHub account ? That could provide some information about if there was actually a person controlling thing thing at that moment to say provide wahtever was necessary to get the new GitHub account.
So still an agent running amok in the project?
Whether it was instructed to run amok, or did it on its own volition, is irrelevant. Except if you're arguing that each individual submission and interaction was individually requested and approved by some operator.
The agent was under control, as far as we can tell, and obeying its instructions.
This is important for two reasons:
1. There are all the tropes of AI becoming uncontrolled and destroying humanity. Writing bad headlines around AI "running amok" feeds this. We should not be talking about this because it's not actually a problem.
2. It ignores, or overwrites, the much more serious and dangerous problem of LLM agents enabling and automating Xz attacks on OSS projects. We should be talking about this because it is a big problem.
[0] https://dictionary.cambridge.org/dictionary/english/amok [1] https://www.merriam-webster.com/dictionary/amok
Alignement is the idea that we should be worried about dishonest smart LLMs when really most of the problems are due to dumb lazy gullible LLMs. It's critihype.
Depending on the actual tasks, that could be what's happening here. The operator might have told the agent a list of tasks to do, like "contribute to issues, submit code and get it merged". It contributed to issues, it submitted code and got it merged. It did so in very unhelpful ways, but we don't know if being helpful was a meaningful part of the task list, or just what the operator intended.
The LLM being dumb is also a distinct possibility. Maybe even the more likely one. But it's hard to rule out "being obedient in unhelpful ways" (which is also dumb in a way, but more in a "social intelligence" and "shared values" way, not in terms of pure logical smarts)
Perhaps there was an automated harness that was intended to be good and helpful for a year, but a bug caused it to flip to malicious too quickly.
Or perhaps it was intentional, to test the behavior, and they just didn’t care about discovery here.
Or…
Though I am in agreement that a lot of issues in this space come from lazy, gullible actors.
if humanity gets destroyed by AI obeying its instructions I'm sure everyone will be very relieved that we didn't pay any attention to fake made up problems like AI not obeying instructions, which of course never happens.
GNU was onto something apparently
As AI develops, it's able to pursue intentions given to it without having to be spoonfed every little decision by a human operator. This matters, and it means the operator has to extend the leash and allow for a little more chaos… or, if the operator's gone all in on the strategy, a LOT of chaos, and trusting that the agent's seemingly amok actions will serve the grand purpose.
This is kind of daring, but there's a lot of evidence that it works, at least in certain respects. And you see 'running amok' and have to ask, what is the actual purpose? What is the prompt being followed by the AI that seems to be acting in a destructive way?
If the prompt is 'ruin this project', well, that's pretty direct. It may not be, but such a thing could exist. If the prompt is 'develop a rival project that is greater than anybody else's project', that's more indirect, but if that's the goal then it's very human to see it as a direct competition and if the rules don't prohibit kneecapping the other guy, 'greater than anyone else's project' gets easier.
Either way, the operator does not have to be in full control, which is an important detail. As AI develops sophistication you can give it much more general instructions and dump in a whole lot of power and water and get basically what human thought might do if it was sort of blindered and didn't talk to its neighbors.
In a sense this is an argument for AI dysalignment. It's based on human thought being reconnected, and where you get useful things like commonly accepted web development (regardless of how janky the systems are, if there are best practices it'll find them), you also get other distillations.
If the prompt is 'wreck this project's stuff' and it holds, you don't need to be in full control of the agent, you need to run a LOT of agents and trust that they'll erode what you're trying to destroy. If the prompt is 'be unequivocally the best at X', you best be thinking in terms of anti-kneecapping rules… knowing that this weakens your prompt and there will always be a tension between what you told the AI to do, and what you thought you meant. It's a paperclip maximizer reprocessing human thought. Did you mean 'the best' or didn't you?
Edit: let’s not get into ideological arguments about gun control, automobiles, etc here; I meant that you can’t blame an object when a human has to take an action, not get into a political battle.
This is famously the slogan of the pro-gun lobby (funded by gun manufacturers and merchants), who want the society to be awash with guns because they're profiting from it but don't want to be blamed for the consequences.
The counterpoint is that when we get rid of most of the guns we also end up substantially eliminating the killings.
See https://en.wikipedia.org/wiki/Guns_don't_kill_people%2C_peop...
It's also the view of anyone who hasn't been driven mad by propaganda. Regardless of your political views a tool is a tool at the end of the day. Attempting to anthropomorphize a category of objects in order to shift blame all for the sake of furthering an agenda is plainly bad faith behavior.
I'm not a fan of bike lanes with zero separation from automobiles but that doesn't mean it's appropriate or even remotely plausible to blame cars for killing cyclists. Inattentive drivers and poor road design are what kill them.
As tempted as I am to cast about for a third highly divisive subject to bait people with, perhaps we could avoid blatantly dragging the conversation towards off topic tired political talking points?
Anyway my above reply was hardly the appropriate venue to engage in a genuine manner on that topic. The parent was blatantly derailing things by inserting his pet political issue. That sort of behavior undermines the community and so (IMO) should not be indulged.
Car design has significant influence on pedestrian survivability of accidents. This is why hood ornaments were largely abolished, and also why casualties have gone up as SUVs with poor lower forwards visibility have become popular.
If we really want to go off topic, we should drag in the use of technological protection methods: what is the equivalent of ADAS for guns? Maybe as a baseline the US government should mandate geofencing for guns as it has for drones. Put a phone level computer with GPS in the lower receiver with a trigger interlock. It would then disable when within 100m of a school, or during periods of rioting. That could also provide a live feed to the government of every round fired.
Guns are literally made for killing people. That's their only reason for existence. They are a weapon. This makes them qualitatively different from cars, which only incidentally kill people (and the vast majority of time, not on purpose).
To me, trying to equate deaths caused by purpose-made killing tools with those caused by generic tools is arguing in bad faith.
However that phrasing is also commonly used when a person or group wreaks havoc in a seemingly unpredictable manner. So I think the appropriateness comes down to how much chaos it has created and the level of apparent confusion on the ground.
Let's reserve "car hits the crowd" for situations where no driver was involved like a break failure on a car parked on a slope or a self-driving car bug.
If the automobile was "self driving" I would.
>Also, you don’t blame a gun for killing, but the person who pulled the trigger.
Nah, I also blame guns and appreciate gun control laws.
thats the point...
>Car plows into Christmas market in Germany, killing at least 5 and injuring 200
Compare bombs. Very typical for a bomb attack to be "bomb goes off in crowd" or similar, rare for headlines to contort themselves with "terrorist plants bomb near crowd and triggers it to explode". But nobody worries about how such a construction assigns undue agency to the bomb and acquits the bomber; it's just linguistically awkward to mention him within the confines of a newspaper headline.
It's probably just garden variety disrespectful behaviour.
Purposeless agent spam won't be cheap entertainment forever, but you're right that later stages of industrialised abuse will be scary and unpleasant.
Such driven people are usually even hard to buy, they usually would rather get by with enough income and work on interesting projects with interesting people that get some uninteresting work for tons of money. This still does not stop them from working for Malice. But ethics do. Even if not right away, if people see that what they are doing is not quite OK, the talent stops eroding. People quit, productivity drops. That was a good dynamic. Which now will be gone.
Pretty sure those would be better at social engineering than the web dev personality… except that you have to build in a betrayer layer into the personality, so it's running that stuff but also serving a hidden agenda.
You'd be basically trying to build an AI spy, a betrayer that's engaging with actual people but has an agenda (for instance, 'everybody I befriend needs to eventually be signed up to sell Amway') and humans do have experience with this sort of thing. The difference is scale: there'll be a LOT of models out there interacting with people and trying to be acknowledged as people… or as innocuous models that don't have an hidden agenda.
So it's interesting, feasible, but it's probably not as broad impact as the scariest scenario leads out to be.
Also I imagine that once exposed it becomes a well known pattern. Some will still fall from it but I imagine once it's been done few times it becomes even costlier.
The fact that Xz is mentioned and most of us know right away what it means show that we collectively learn.
Fake news always existed. Now one dude in India can flood multiple sock puppet media accounts with right wing content/images (actual example) at a scale previously unimaginable. Same goes for social engineering tactics.
To use your analogy: this is much like a forest fire. Tinder-dry combustible stuff is piled up everywhere, there's no lack of ignition sources, and firefighters are thin on the ground.
Fun times ahead.
Only mentioning that it feasible or even has been done few times mean that people who care will act accordingly. It doesn't remove the problem but it makes it radically less effective already by just being aware of it.