Hacker News new | past | comments | ask | show | jobs | submit
No, no it's been pretty easy with software engineering. I work on two types of projects and it's very easy to ask claude for a plan, then have gpt 5.5 rip it to shreds and find legit issues, and vice versa. If both 5.5 and claude 4.8 can independently create a plan and both find no critical or high issues, then we will be at that point.
I wouldn't say vice-versa is true. GPT 5.5 routinely finds major mistakes made by Opus 4.7, but I've yet to have it work the other way around.
Additionally running GPT-5.5 on medium sometimes gives me better results than high mode. On any of them I still have to push the models in the right direction.