Hacker News new | past | comments | ask | show | jobs | submit
Nice exercise. Couple things:

- I think the exercise was inconclusive for Claude and Gemini because they hardly tried to solve the task at hand. So the scores don't mean much.

- I did the same exercise for an app I built and I asked the models to do something similar; Interestingly the models (Opus 4.6, 4.7 and Gemini 3.1 Pro) never refused to try to exploit. The difference is that in the first few runs, they found some exploits which I fixed but after fixing those - the models could never find any other exploit even though I knew things existed which could be exploited. It felt like they suggested everything and tried everything that was in their training set and that's it; they were just not able to think anymore.

loading story #48402433
Its weird having protections against finding exploits: what if I developed the app? Would it require having the development steps still in the context.. thats unlikely and also not any kind of proof.

What if I intersperse exploit finding in my normal development, as you `probably should? Refusing there would be really weird to me.

I used to think that the models would not refuse to find exploits in any work done locally but I have only tested this theory on the (obscure) apps that I have built on my machine. Now if i forked pandas and started asking models to find exploits of certain kind then I'd like to think the models will start refusing after a point.