Hacker News new | past | comments | ask | show | jobs | submit

VibeThinker: 3B param model that beats Opus 4.5 on reasoning with novel SFT+GRPO

https://arxiv.org/abs/2606.16140
loading story #48640319
loading story #48643256
loading story #48640144
loading story #48640063
loading story #48648287
loading story #48640611
loading story #48639919
loading story #48645521
loading story #48644431
loading story #48646102
loading story #48646036
loading story #48639746
loading story #48643959
loading story #48643385
loading story #48645335
loading story #48644506
loading story #48640177
loading story #48642037
loading story #48644547
loading story #48643320
loading story #48646448
loading story #48643460
loading story #48641877
loading story #48639638
loading story #48644665
loading story #48641306
loading story #48647196
loading story #48644963
loading story #48645824
loading story #48639729
loading story #48641808
loading story #48640372
loading story #48642185
loading story #48645597
loading story #48642211
loading story #48642214
loading story #48642076
loading story #48641090
loading story #48645661