A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
LLMs are just the beginning, we'll see more specialized math AI resembling StockFish soon.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
This is the caliber of thinking in unimpaired AI bullishness.
Dystopia vibes from the fictional "Manna" management system [0] used at a hamburger franchise, which involved a lot of "reverse centaur" automation.
> At any given moment Manna had a list of things that it needed to do. There were orders coming in from the cash registers, so Manna directed employees to prepare those meals. There were also toilets to be scrubbed on a regular basis, floors to mop, tables to wipe, sidewalks to sweep, buns to defrost, inventory to rotate, windows to wash and so on. Manna kept track of the hundreds of tasks that needed to get done, and assigned each task to an employee one at a time. [...]
> At the end of the shift Manna always said the same thing. “You are done for today. Thank you for your help.” Then you took off your headset and put it back on the rack to recharge. The first few minutes off the headset were always disorienting — there had been this voice in your head telling you exactly what to do in minute detail for six or eight hours. You had to turn your brain back on to get out of the restaurant.
Most repeat customers use the app, which sports the digital equivalent of a loyalty program, and various coupons. And lets you save your 'usual' order with customizations etc. Plus the annoying push notifications for FreeFrydays or whatever. And upsells, new product launches, etc.
My recollection is that the kiosk is just a weak facsimile of the app. And wasn't terrible, but everyone's standards vary.
There's much more to being human than our "cognitive abilities"
Not obvious and in fact I think the opposite is way more likely. Chess is well-defined and self-contained in a way that managing a restaurant with fleshy customers never will be.
Stockfish's neural net evaluation model was trained on millions of its positions with its own original algorithmic evaluation function (entirely developed by humans) and search tree. The result was a much smaller model than Leela's that requires little computation (not even a GPU), paired with its already extremely efficient search/pruning algorithms that made it stronger than Leela in competitive play. Leela's evaluation function is much stronger (at one ply it has an ELO of around 2300, Stockfish is probably closer to 1800), but it requires vastly more resources and those are always bounded in a match.
Humans haven't learned as much new information about chess from Stockfish as we have from Leela.
All AI proofs so far, including this one, are using existing tools in new ways, rather than inventing new tools. This is not surprising if you know how these models are trained. These existing tools are in distribution. New tools are not.
Problems worth of a Fields Medal likely require new tools to be invented. Thus it is not clear whether progress within the confines of the current paradigm is enough.
We could get this weird spiky situation where the AI is insanely superhuman at all problem solving, but completely incapable of coming up with a single new tool. It discovers everything there is to discover, subject to existing axioms and concepts.
Timothy Gowers gives some commentary on this in the attached PDF.
We have that chess board for quite a while now, over 40 years. And no, there is nothing special about Lean here, it is just herd mentality. Also, we don't know how much training with Lean helped this particular model.
https://www.anthropic.com/research/project-vend-1 https://www.wsj.com/tech/ai/anthropic-claude-ai-vending-mach...
(Two different examples of a similar idea)
Heuristically weighted directed graphs? Wow amazing I'm sure nobody has done that before.
Math is a sequence of formal rules applied to construct a proof tree. Therefore an AI trained on these rules could be far more efficient, and search far deeper into proof space