Story Detail of id 48312185 | Liveview Hacker News

gAI22 hours ago | on: Claude Opus 4.8

4.7 was the first time I had to resort to using the previous version (4.6) for most use cases. Hoping 4.8 rectifies this.

ishurand420 hours ago | parent | next

They just showed the benchmarks it improved on but it regressed on so much more, such as the MCRR benchmark: "On multi-round coreference/context recall tests (often cited as MRCR or long-text retrieval benchmarks), Opus 4.7 reportedly dropped from roughly 78.3% down to 32.2% compared to Opus 4.6."

merlindru22 hours ago | parent | next

Same. 4.7 felt like a definite regression

supern0va22 hours ago | root | parent | next

Interestingly enough, 4.7 actually did regress on a few benchmarks from 4.6, so it's more than just vibes.

gAI22 hours ago | root | parent | next

It seems like a lot of things fed into that. Anthropic couldn't keep up with the compute costs when they got a huge influx of users. (So) effort level defaults got turned down. (Looks like we have direct effort control in the web interface now - thrilled about that!) Adaptive Thinking, while usually cheaper for them, seems less robust than Extended Thinking. And this part is just vibes, but the alignment on 4.7 feels too stiff. I understand wanting the model to push back more, but it seems like 4.7 will push back reflexively in situations where it's just odd.

loading story #48312432

ACCount3722 hours ago | root | parent

4.7 is a different base model from 4.6, so it's possible that they introduced regressions with pre-training changes, or undercooked the post-training stage.

loading story #48316883

throwatdem1231117 hours ago | root | parent

4.7 was just them starting on the path on getting prices in line with the actual cost

Make it dumber. Charge more (by changing the tokenizer). Call it the latest and greatest. Reset expectations.

ruairidhwm17 hours ago | parent | next

I managed to find that Haiku outperformed Sonnet on some tasks...don't want to blog spam but if anyone is interested: https://www.ruairidh.dev/blog/sonnet-4-6-drops-format-rule-o...

petterroea21 hours ago | parent | next

Same. 4.7 has done some incredibly stupid things.

dbbk19 hours ago | root | parent

I think this is a more a consequence of the introduction of adaptive thinking and removal of extended thinking, than 4.7 specifically

loading story #48321976

rhubarbtree22 hours ago | parent | next

Same. So happy when I found that option.

gAI22 hours ago | root | parent

Unfortunately, looks like 4.6 is now gone from the web ui.

loading story #48312714

tanepiper19 hours ago | parent | next

Yep, until 1st June 4.6 is still x1 on Copilot, but will jump up quite a bit in coat - 4.7 was already highly priced, and the output was frankly terrible.

It still seems trying to build general models is mostly cost prohibitive - the frontier model provider and resellers are repricing in such a way the return on investment is dropping as developers and users become more cautious of burning their limits.

I'm still of the opinion that models like 4.6 don't need to be improved on - rather they need to be better integrated with more domain specific models in agentic flows.

dezsirazvan19 hours ago | parent

same!

#visit	13,436,795
#session	74,665
#live-session	0