Story Detail of id 47679493 | Liveview Hacker News

oliver23618 hours ago | on: System Card: Claude Mythos Preview [pdf]

isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?

HarHarVeryFunny14 hours ago | parent | next

If it's so great at software engineering and bug fixing, then why does Claude Code still have 5000+ open bugs?

https://github.com/anthropics/claude-code/issues?q=is%3Aissu...

Apparently whatever SWE-bench is measuring isn't very relevant.

loading story #47684679

loading story #47686245

loading story #47682740

Eufrat17 hours ago | parent | next

Anthropic needs to show that its models continually get better. If the model showed minimal to no improvement, it would cause significant damage to their valuation. We have no way of validating any of this, there are no independent researchers that can back any of the assertions made by Anthropic.

I don’t doubt they have found interesting security holes, the question is how they actually found them.

This System Card is just a sales whitepaper and just confirms what that “leak” from a week or so ago implied.

loading story #47681794

loading story #47685302

loading story #47684438

nsingh218 hours ago | parent | next

It's going to be expensive to serve (also not generally available), considering they said it's the largest model they've ever trained.

I suspect it's going to be used to train/distill lighter models. The exciting part for me is the improvement in those lighter models.

loading story #47680867

loading story #47680692

RivieraKid16 hours ago | parent | next

I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.

loading story #47686282

loading story #47684732

loading story #47681342

nozzlegear17 hours ago | parent | next

Freak out about what? I read the announcement and thought "that's a dumb name, they sure are full of themselves" – then I went back to using Claude as a glorified commit message writer. For all its supposed leaps, AI hasn't affected my life much in the real except to make HN stories more predictable.

loading story #47680648

yrds9617 hours ago | parent | next

I think there's no SOA advance on this one worthy of "freaking out".

Looks like they just built a way larger model, with the same quirks than Claude 4. Seems like a super expensive "Claude 4.7" model.

I have no doubts that Google and OpenAI already done that for internal (or even government) usage.

loading story #47680680

mofeien18 hours ago | parent | next

I am freaking out. The world is going to get very messy extremely quickly in one or two further jumps in capability like this.

loading story #47681339

anuramat18 hours ago | parent | next

"some model I don't get to use is much better at benchmarks"

pick one or more: comically huge model, test time scaling at 10e12W, benchmark overfit

loading story #47679789

RobertDeNiro17 hours ago | parent | next

Well for one, it’s a PDF

dysoco18 hours ago | parent | next

Wait until you see real usage. Benchmark numbers do not necessarily translate to real world performance (at least not by the same amount).

ryeights14 hours ago | parent | next

Until recently I would have described myself as an AI skeptic. HN has been a great source for cope on the AI subject over the years. You can find nitpicks, caveats, all sorts of reasons to believe things aren’t as significant as they seem. For me Opus 4.5 was the inflection point where I started to think “maybe this isn’t a bubble.” The figures in this report, if accurate, are terrifying.

risyachka16 hours ago | parent

the time to freak out was 2 years ago.

#visit	13,256,008
#session	74,665
#live-session	0