> A difficult part was constructing a chess board on which to play math (Lean). Now it's just pattern recognition and computation.
However, this was not verified in Lean. This was purely plain language in and out. I think, in many ways, this is a quite exciting demonstration of exactly the opposite of the point you're making. Verification comes in when you want to offload checking proofs to computers as well. As it stands, this proof was hand-verified by a group of mathematicians in the field.
Yeah, but I wouldn't be surprised if they train the model on verification assisted by Lean.
Arguing similarly to how stockfish, the chess engine, trains I would not be surprised if this is more common in the future. I don't know if they use any proof verification tools during their reinforcement learning procedure right now, as far as I know they've been focusing more on COT based strategies (w/o Lean). But I'm hardly an LLM expert, I don't know.
They most definitely threw in rl with formal verification somewhere between GPT 4 and now. The models are better at not hallucinating. I don't think their IMO team are only show ponies...
Same could be said for human mathematicians that learn from tools like Lean.
That may be true for now, but it seems clear enough that letting the model use Lean in its internal reasoning process would be a great idea
That I'd agree with! I really need to get around to learning Lean myself. It might be interesting to try and formalize some missing theoretical pieces from my field (or likely start smaller).
how would they calculate "probability of solving" without automated verification?
> However, this was not verified in Lean.
This is the caliber of thinking in unimpaired AI bullishness.