It seems harsh to critique guardrails and take them into account in the scoring when GPT-5.5 seems to have been explicitly whitelisted to remove most of said guardrails. A more fair comparison would be a vanilla GPT account.
I agree fully and hope someone else is able to do this test! For me it was a matter of cost and quotas that stopped me from changing to a new account.
Also just to mention:
Claude guardrails —> that session terminated.
GPT guardrails -> your whole account is slowed down.
Does it matter when you can’t have the opus 4.8 guard rails removed? With GPT at least you can and they’re quick about it
I mean, yes. Most people aren’t security researchers, and either way it’s apples to oranges at that point if you’re counting “the guardrails stopped me” as a negative for one but not the other.
But should developers be barred from asking an LLM to try secure their own app? Its not different from finding exploits...
loading story #48402630