Story Detail of id 48501467 | Liveview Hacker News

piker11 hours ago | on: Claude Fable is relentlessly proactive

This is exactly right. By offloading this trivial task to the LLM, Simon has abandoned the opportunity to evaluate the abstraction with additional information and improve it. Instead, we let the agent spend $12 and make the fix while learning nothing.

simonw9 hours ago | parent | next

Things I learned from this:

- Fable will do a whole lot more than you might expect in order to verify a fix. I learned that it's "relentlessly proactive". That's a good title for a blog entry!

- You can take screenshots of a window in macOS using the "screencapture" CLI command, but you'll need the integer window ID first.

- That windowID is accessible via "Quartz.CGWindowListCopyWindowInfo(Quartz.kCGWindowListOptionOnScreenOnly, Quartz.kCGNullWindowID)" using the pyobjc-framework-Quartz library, which installs cleanly via "uv run".

- A neat trick for simulating keyboard shortcuts is to run document.dispatchEvent(new KeyboardEvent("keydown", {key: "/", bubbles: true})); after the page loads.

- You don't need Flask or Starlette to run a CORS-enabled localhost server for capturing JSON from another window - 19 lines of code against the Python standard library http.server package works just fine.

- getComputedStyle(document.querySelector("navigation-search").shadowRoot.querySelector("textarea")) works to read dimensions from inside a Web Component's shadow DOM.

- defaults write com.google.chrome.for.testing AppleShowScrollBars Always

- Claude Fable knows how to apply all of the above. It's always interesting to pick up hints of what a model can and cannot do.

I'm always confused at how many people equate using a coding agent to solve a problem with "learning nothing". If you pay attention to what it's doing you can learn so much!

loading story #48502787

loading story #48504798

loading story #48504944

loading story #48503662

loading story #48503471

loading story #48505122

loading story #48505938

loading story #48503435

loading story #48502968

loading story #48503017

saberience9 hours ago | root | parent

And Fable is still worse than Codex.

I use both and the only thing (as always) that I will use Claude for is UI design.

Opus 4.8 and now Fable are still both worse at actually getting the job done than the Codex model. Claude models write FAR too much code when it's not needed, they burn far too many tokens, when they are not needed, write un-necessary tests, write plans which are 5 pages longer than are needed, etc. etc.

Have you actually compared code quality and plan quality versus Codex? It's demonstrably worse.

loading story #48503292

loading story #48502777

loading story #48502733

snowwrestler6 hours ago | parent | next

But Simon is not trying to get good at CSS debugging, Simon is trying to learn about AI systems and produce content about them. So giving the AI agent a trivial task to go crazy on is a feature, not a bug.

For $12 implied cost, he got a front-page post on HN with 500 comments. What is that worth? :-)

loading story #48504708

loading story #48504733

jmmcd10 hours ago | parent | next

People are missing that Willison is among the very best people we have in the role of (for lack of a good name): early access to frontier models, evaluate them in real scenarios, no wishful thinking, hype, or doom, communicate the possibilities. Yes he could have fixed this himself but then he would have learned nothing about the AI, and we wouldn't have read a fascinating and important article.

loading story #48502170

justinclift3 hours ago | parent | next

> By offloading this trivial task to the LLM, Simon has abandoned the opportunity to evaluate the abstraction [...]

While by itself that would be true, Simon commonly blogs about things he's up to.

That action provides the opportunity for evaluation, and additionally evaluation by a wider audience.

So, it's not the same scenario as non-bloggers offloading a task... :)

3 hours ago | parent | next

{"deleted":true,"id":48506171,"parent":48501467,"time":1781281749,"type":"comment"}

discordance10 hours ago | parent | next

I see it as a prioritization exercise. I know the above is a trivial example, but more generally, does the guy who wrote Datasette and Django want to wrangle front end and css, or do they want to work on something else?

loading story #48503476

oulipo210 hours ago | parent

[flagged]

loading story #48502350

loading story #48501984

#visit	13,787,110
#session	74,665
#live-session	0