Hacker News new | past | comments | ask | show | jobs | submit
Ah, I meant that MCTS uses more inference-time compute (over GRPO) to produce a training sample