Hacker News new | past | comments | ask | show | jobs | submit
And notable absence of DeepSWE benchmark where they do badly, but somehow a benchmark that was published yesterday is in this announcement.
loading story #48468004