Hacker News new | past | comments | ask | show | jobs | submit
You misunderstand the "test" here to mean programming, rather than test against the model's capabilities.