Hacker News new | past | comments | ask | show | jobs | submit
Alignment is more than just about being dishonest. Although I'd also say terms like "dishonest" or "dumb" aren't helpful when referring to the issue. It continues to fall into the trap of anthropomorphizing these things, as people like to do.

Alignment is just "did the model behave in accordance with the human's intentions, values, and objectives"

In this particular instance, if this was supposed to be a supply chain attack and the model was instructed to build trust by being helpful, it clearly failed it did not follow the human's actual intentions, so it was an alignment failure.

Anyway, I'm getting off track, all that to say "the agent was dumb" implies that these agents have a potential for intelligence in the first place, which is currently not the case (by intelligence, I mean cognitive intelligence; they still lack agency and intent). They are not smart or dumb, they are simply either aligned with the human not. In this case, it failed, the agent was not aligned with the intended outputs.