Hacker News new | past | comments | ask | show | jobs | submit
Just chiming in to inject some healthy skepticism into this comment thread. It's helpful for me (and for my mental health) to consider incentives when announcements like this happen.

I don't doubt that this model is more powerful than Opus 4.6, but to what degree is still unknown. Benchmarks can be gamed and claims can be exaggerated, especially if there isn't any method to reproduce results.

This is a company that's battling it out with a number of other well-funded and extremely capable competitors. What they've done so far is remarkable, but at the end of the day they want to win this race. They also have an upcoming IPO.

Scare-mongering like this is Anthropic's bread and butter, they're extremely good at it. They do it in a subtle and almost tasteful way sometimes. Their position as the respectable AI outfit that caters to enterprise gives them good footing to do it, too.

I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.

Data has always been the core of it all, onward to the next abstraction, I suppose.

loading story #47684395
What would be the incentive to engage in the tactic when the proof is ultimately in the pudding when the model hits the streets? Who would ultimately benefit from fudging these numbers?
Is it healthy? Maybe every company is a profit-maximizer wearing a skin suit, and people support their siblings exactly twice as much as their cousins.

When you slice down to the game-theory-optimal bone, you are, in some sense, cutting off their wiggle room to do anything else

loading story #47683197
If anything I’m seeing too much skepticism and not enough alarm. People burying their heads in the sand, fingers in their ears denying where this is all going. Unbelievable except it’s exactly what I expect from humans.
loading story #47685831
loading story #47687047
loading story #47685287