this is the line I keep in Agents.md that helps me prevent Codex from playing smart
When a "person" that you don't view as a "real" person repeatedly does exactly what you just told it not to do (often amid false assurances it understands and will avoid doing so in the future), most people get angry.
Compare it to how the kind of people who treat children like property treat their kids, or other examples of keeping people as property.
We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.
I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.
But I avoid unnecessary emotion in my prompts because I don't want potentially distracting activations. Kind of like communicating with humans.
> impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts.
Unless the mechanism is understood, my assumption is that this is a moving target.
https://www.anthropic.com/research/emotion-concepts-function
How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.
Bonus points if you find yourself actually saying it out loud while typing it.
I have used the word "shenanigans" way more in a couple of years of agentic coding than in 30 years of writing code with humans.