Story Detail of id 47688042 | Liveview Hacker News

saberience9 hours ago | on: System Card: Claude Mythos Preview [pdf]

I've never understood the point of things like HLE, it doesn't really prove or show anything since 99.99% of humans can't do a single question on this exam.

That is, it's easy to make benchmarks which humans are bad at, humans are really bad at many things.

Divide 123094382345234523452345111 by 0.1234243131324, guess what, humans would find that hard, computers easy. But it doesn't mean much.

Humanity's last exam (HLE) couldn't be completed by most of humanity, the vast majority, so it doesn't really capture anything about humanity or mean much if a computer can do it.

DroneBetter8 hours ago | parent

the point is that each question is something that a specialist in a field would be able to do, but deems challenging enough that the ability to solve it would imply significant general usefulness in that domain

#visit	13,262,030
#session	74,665
#live-session	0