I can recognize so much of the GPT/Codex generated code long after it gets merged (not by me).
Additionally, the time spent on every agent turn on GPT 5.5 is much longer compared to Claude Opus 4.8, which means iterating on the code takes a lot more patience, and there's a lot more nitpicks to pick when actually using GPT 5.5 to do software engineering.
Feels like GPT-style models are more geared on doing one-shot software vibing (and handling the vibe coded mixture) compared to Claude's focus on actual software maintenance. I got a GPT Pro sub for free and wanted to cancel my Claude subscription so much, but I still keep reaching Claude models a lot more. Frustrating.
this is the line I keep in Agents.md that helps me prevent Codex from playing smart
We were reviewing reports of situations where the models failed to follow directions and there was a common thread of some where when the operator got the model to acknowledge the rule breach, it quoted back something that included swearing.
I don’t have the data to truely look into it, but I did give the instruction to my engineers to avoid it as a “might be a problem”.
How so? Plenty of swearing in lots of training data, especially older code, e.g. in Linux.