Hacker News new | past | comments | ask | show | jobs | submit
Right, and that's why it's only part of the job. The benchmarks they're currently doing compose of the AI being handed a detailed spec + tests to make pass which isn't really what developing a feature looks like.

Going from fuzzy under-defined spec to something well defined isn't solved.

Going from well defined spec to verification criteria also isn't.

Once those are in place though, we get https://vinext.io - which from what I understand they largely vibe-coded by using NextJS's test suite.

> First one that comes to mind is that 100% code coverage in tests means that software is perfect

I agree.. but I'm also not sure if software needs to be perfect