Story Detail of id 47689788 | Liveview Hacker News

gck19 hours ago | on: System Card: Claude Mythos Preview [pdf]

What is the automated verification of correct output and who defines that?

But before verification, what IS correct output?

I understand SWE process is unique in that there are some automations that verify some inputs and outputs, but this reasoning falls into the same fallacies that we've had before AI era. First one that comes to mind is that 100% code coverage in tests means that software is perfect.

AstroBen7 hours ago | parent

Right, and that's why it's only part of the job. The benchmarks they're currently doing compose of the AI being handed a detailed spec + tests to make pass which isn't really what developing a feature looks like.

Going from fuzzy under-defined spec to something well defined isn't solved.

Going from well defined spec to verification criteria also isn't.

Once those are in place though, we get https://vinext.io - which from what I understand they largely vibe-coded by using NextJS's test suite.

> First one that comes to mind is that 100% code coverage in tests means that software is perfect

I agree.. but I'm also not sure if software needs to be perfect

#visit	13,264,046
#session	74,665
#live-session	0