Hacker News new | past | comments | ask | show | jobs | submit
The methodoly used is quite naive.

I've used glm 5.1 on fairly advanced crackme challenges (example: https://crackmes.one/crackme/698f40f1e2ba6023bfacaa82), and to my suprise it was able to patch binaries, doing runtime analysis, bypassing anti debug techniques, etc.

Expecting the model to do everything by itself is unrealistic, I found that working along the modal works really well. I'm not speaking about spoiling the solution, just tell it which direction to explore. Chinese models are much more capable than people give it credit for, but Claude/Codex won the marketing game.

The only usecase of this methodology would be for CI integration, which can be nice but I think security reviews still need human attention and expertise.

> Expecting the model to do everything by itself is unrealistic

Well that’s the pitch.

Is it? Aren't most edge LLM capabilities determined by specialized harnesses?
Thank you for your note! As I mention in the post this is not scientific at all.

I'm very curious how you would do multiple runs of multiple models in a "work alongside the model" manner?

loading story #48400789
Maybe have a second model that is configured to nudge the first model in the direction of exploration, and have the two of them work in tandem?
>>I've used glm 5.1 on fairly advanced crackme challenges

which have most likely been trained on, so all you did was regurgitate someone elses solution

loading story #48399250
Claude used to be good with CTFs, but they added tons of guard rails lately and now it just says "Sorry, I can't help with anything to do with that"
loading story #48399289
Sorry, Dave. I can't do that.