Hacker News new | past | comments | ask | show | jobs | submit
As you note, I wonder to what extent this is a harness issue?

I've been experimenting with different harnesses for local models, and with (IIRC) Hermes and Qwen3.6-35B-A3B I was amazed the lengths it went to (writing test code, opening it in a browser, screenshotting, analysing the screenshot, exploring multiple pages of an existing website again with screenshots/analysis) to solve a query I would have naively expected it to simply provide a coded solution to.

Absolutely is. The “Shelly” harness from exe.dev could already do the same thing, creating pages and debugging them, while having full system access, months ago with Sonnet 4.5