Hacker News new | past | comments | ask | show | jobs | submit
> the binding constraint has moved from can you build it to can you tell whether it’s right.

My suspicion is we are still moving up along a continuum of capability.

Models didn’t used to produce coherent sentences (GPT-2 era) and now they can. Past models (GPT-3 era) made syntax errors and now models can write well structured code. Past models didn’t reliably emit correct syntax to request a tool call, or track context across multiple tool calls - and now they can.

Frontier models can’t write code without glaring security flaws, even as they already follow other best practices. So on those two criteria of code quality we are still in need of models improvements.

All these forms of correctness lie along a continuum. Today’s models can’t assess what’s needed in a domain for work to be “good” - but if current trends hold it’s just a matter of time.