Story Detail of id 48498951 | Liveview Hacker News

jampa16 hours ago | on: Claude Fable is relentlessly proactive

Fable feels like a version of Opus running on a harness that won't let it halt until it's sure the issue is fixed, which makes sense if what you want is a model that's better at benchmarks.

It's a very good model, but it comes at a huge premium: not only do the tokens cost more, but the model itself really wants to spend them all. For example, working with React Native, Fable never just says "okay, I did the thing, that's it." It tries to rebuild the entire app from scratch, run the whole test suite, and watch every log and warning.

This is the first time with LLMs I've felt that upgrading to a model isn't worth it, even if my company lets me use it, because all the building / testing was just destroying my machine and its battery, which keeps me from working on other things.

For now, it feels like Opus with ultracode is a better choice (less pollution of the main context, more parallelism in investigations).

conradkay16 hours ago | parent | next

Does low/medium effort fix it for you? Seems like Fable 5 low can outperform Opus 4.8 high/xhigh often, and uses a lot fewer tokens

loading story #48502340

loading story #48499825

loading story #48499649

loading story #48504858

sanex15 hours ago | parent | next

I've found the opposite. Granted I use sub agents heavily but I've had it run for hours with far fewer tokens used than when I was previously using opus4.6-8.

loading story #48506391

threatripper16 hours ago | parent | next

On what setting in which environment do you run it? I use the VSCode extension on Extra High and feel like it does exactly what needs to be done and stops when the thing I asked for is done. Extra comments come only when they fall into the area of code that was changed.

loading story #48499183

Gareth32110 hours ago | parent | next

I like this proactivity in theory, but as you say: it's expensive. I wonder if this can be solved with the right prompt. E.g. "these are your constraints. Only resolve x. If you are unsure if a task is outside constraint, check with me first."

esjeon14 hours ago | parent | next

> the model itself really wants to spend them all

In fact, Opus does the same. It finishes the job, and redo it from scratch before presenting the result to the user. This happens even for simpler writing tasks especially when I instruct it to create a text file.

epolanski11 hours ago | parent | next

> which makes sense if what you want is a model that's better at benchmarks

This so much.

Opus 4.6 was the last Anthropic model that was good at assisting you, 4.7 and later ones have completely inverted this relationship and it's you assisting it.

Yes, I admit they are smarter, I admit we've reached a point where LLMs are more creative and could be writing better code (albeit with some design hiccups) than I do, but they are also increasingly bad at helping me.

Sure, they do my job when prompted 8 times out of 10 (but then, what's the point of having me anyway?), but my issue is that when I try to invert the relationship they will keep jumping onto solving the issues themselves and disregard my feedback or request.

E.g. I wanted to know some DNS details of an emailer module in Fable 5 and it jumped onto "why I should've used magic links", it just not did what asked.

E.g. 2. There was a worker machine that had an environment misconfiguration and I tasked it to find which github action was setting that specific flag and where. Instead of answering a question, it jumped into just hardcoding it in the code.

E.g. 3. I had some issues with batching, and while I tasked it to investigate whether batching was needed at all for that particular problem (hint, it wasn't) it went and changed the batching logic as to fix the bug.

I am extremely disappointed with Fable's personality.

I can clearly see it's strong, but I'm wondering whether the relationship of LLMs as assistant has broken forever, and it's us now that are being tasked into assisting them instead, because that's how it feels.

The training/reinforcement is clearly biased towards solving problems, not answering questions.

loading story #48504541

dyauspitr16 hours ago | parent

It’s not just a more proactive and diligent opus. The capabilities are significantly higher on fable. It’s not a paradigm shift, but it’s close.

loading story #48499218

loading story #48502313

loading story #48499977

#visit	13,784,856
#session	74,665
#live-session	0