Story Detail of id 42796967 | Liveview Hacker News

What is the evidence for 1) ? I thought that the latest models were getting "somewhere" with fairly trivial reasoning tests like ARC-1

It may be that you can just find the solution for these tests by interpolating from a very large dataset.