Story Detail of id 47691694 | Liveview Hacker News

AstroBen8 hours ago | on: System Card: Claude Mythos Preview [pdf]

Right, and that's why it's only part of the job. The benchmarks they're currently doing compose of the AI being handed a detailed spec + tests to make pass which isn't really what developing a feature looks like.

Going from fuzzy under-defined spec to something well defined isn't solved.

Going from well defined spec to verification criteria also isn't.

Once those are in place though, we get https://vinext.io - which from what I understand they largely vibe-coded by using NextJS's test suite.

> First one that comes to mind is that 100% code coverage in tests means that software is perfect

I agree.. but I'm also not sure if software needs to be perfect

#visit	13,264,703
#session	74,665
#live-session	0