Hacker News new | past | comments | ask | show | jobs | submit
isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?
If it's so great at software engineering and bug fixing, then why does Claude Code still have 5000+ open bugs?

https://github.com/anthropics/claude-code/issues?q=is%3Aissu...

Apparently whatever SWE-bench is measuring isn't very relevant.

loading story #47684679
loading story #47686245
loading story #47682740
Anthropic needs to show that its models continually get better. If the model showed minimal to no improvement, it would cause significant damage to their valuation. We have no way of validating any of this, there are no independent researchers that can back any of the assertions made by Anthropic.

I don’t doubt they have found interesting security holes, the question is how they actually found them.

This System Card is just a sales whitepaper and just confirms what that “leak” from a week or so ago implied.

loading story #47681794
loading story #47685302
loading story #47684438
It's going to be expensive to serve (also not generally available), considering they said it's the largest model they've ever trained.

I suspect it's going to be used to train/distill lighter models. The exciting part for me is the improvement in those lighter models.

loading story #47680867
loading story #47680692
I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.
loading story #47686282
loading story #47684732
loading story #47681342
Freak out about what? I read the announcement and thought "that's a dumb name, they sure are full of themselves" – then I went back to using Claude as a glorified commit message writer. For all its supposed leaps, AI hasn't affected my life much in the real except to make HN stories more predictable.
loading story #47680648
I think there's no SOA advance on this one worthy of "freaking out".

Looks like they just built a way larger model, with the same quirks than Claude 4. Seems like a super expensive "Claude 4.7" model.

I have no doubts that Google and OpenAI already done that for internal (or even government) usage.

loading story #47680680
I am freaking out. The world is going to get very messy extremely quickly in one or two further jumps in capability like this.
loading story #47681339
"some model I don't get to use is much better at benchmarks"

pick one or more: comically huge model, test time scaling at 10e12W, benchmark overfit

loading story #47679789
Well for one, it’s a PDF
Wait until you see real usage. Benchmark numbers do not necessarily translate to real world performance (at least not by the same amount).
Until recently I would have described myself as an AI skeptic. HN has been a great source for cope on the AI subject over the years. You can find nitpicks, caveats, all sorts of reasons to believe things aren’t as significant as they seem. For me Opus 4.5 was the inflection point where I started to think “maybe this isn’t a bubble.” The figures in this report, if accurate, are terrifying.
the time to freak out was 2 years ago.