Hacker News new | past | comments | ask | show | jobs | submit
Do you want to see scaling curves wrt data and param size? I agree that 1.2B and 10B tokens is not representative, but what scale of parameters and dataset sizes would be convincing?
Not to sound facetious, but perhaps enough runs at different param/token sizings to define a curve?