Hacker News new | past | comments | ask | show | jobs | submit
This is exactly right. By offloading this trivial task to the LLM, Simon has abandoned the opportunity to evaluate the abstraction with additional information and improve it. Instead, we let the agent spend $12 and make the fix while learning nothing.
Things I learned from this:

- Fable will do a whole lot more than you might expect in order to verify a fix. I learned that it's "relentlessly proactive". That's a good title for a blog entry!

- You can take screenshots of a window in macOS using the "screencapture" CLI command, but you'll need the integer window ID first.

- That windowID is accessible via "Quartz.CGWindowListCopyWindowInfo(Quartz.kCGWindowListOptionOnScreenOnly, Quartz.kCGNullWindowID)" using the pyobjc-framework-Quartz library, which installs cleanly via "uv run".

- A neat trick for simulating keyboard shortcuts is to run document.dispatchEvent(new KeyboardEvent("keydown", {key: "/", bubbles: true})); after the page loads.

- You don't need Flask or Starlette to run a CORS-enabled localhost server for capturing JSON from another window - 19 lines of code against the Python standard library http.server package works just fine.

- getComputedStyle(document.querySelector("navigation-search").shadowRoot.querySelector("textarea")) works to read dimensions from inside a Web Component's shadow DOM.

- defaults write com.google.chrome.for.testing AppleShowScrollBars Always

- Claude Fable knows how to apply all of the above. It's always interesting to pick up hints of what a model can and cannot do.

I'm always confused at how many people equate using a coding agent to solve a problem with "learning nothing". If you pay attention to what it's doing you can learn so much!

loading story #48502787
loading story #48504798
loading story #48504944
loading story #48503662
loading story #48503471
loading story #48505122
loading story #48505938
loading story #48503435
loading story #48502968
loading story #48503017
And Fable is still worse than Codex.

I use both and the only thing (as always) that I will use Claude for is UI design.

Opus 4.8 and now Fable are still both worse at actually getting the job done than the Codex model. Claude models write FAR too much code when it's not needed, they burn far too many tokens, when they are not needed, write un-necessary tests, write plans which are 5 pages longer than are needed, etc. etc.

Have you actually compared code quality and plan quality versus Codex? It's demonstrably worse.

loading story #48503292
loading story #48502777
loading story #48502733
But Simon is not trying to get good at CSS debugging, Simon is trying to learn about AI systems and produce content about them. So giving the AI agent a trivial task to go crazy on is a feature, not a bug.

For $12 implied cost, he got a front-page post on HN with 500 comments. What is that worth? :-)

loading story #48504708
loading story #48504733
People are missing that Willison is among the very best people we have in the role of (for lack of a good name): early access to frontier models, evaluate them in real scenarios, no wishful thinking, hype, or doom, communicate the possibilities. Yes he could have fixed this himself but then he would have learned nothing about the AI, and we wouldn't have read a fascinating and important article.
loading story #48502170
> By offloading this trivial task to the LLM, Simon has abandoned the opportunity to evaluate the abstraction [...]

While by itself that would be true, Simon commonly blogs about things he's up to.

That action provides the opportunity for evaluation, and additionally evaluation by a wider audience.

So, it's not the same scenario as non-bloggers offloading a task... :)

{"deleted":true,"id":48506171,"parent":48501467,"time":1781281749,"type":"comment"}
I see it as a prioritization exercise. I know the above is a trivial example, but more generally, does the guy who wrote Datasette and Django want to wrangle front end and css, or do they want to work on something else?
loading story #48503476
[flagged]
loading story #48502350
loading story #48501984