Hacker News new | past | comments | ask | show | jobs | submit
In 2019, GPT-2 1.5B was trained on ~10B tokens.

Last week Hugging Face released SmolLM v2 1.7B trained on 11T tokens, 3 orders of magnitude more training data for the same number of tokens with almost the same architecture.

So even back in 2019 we can say we were working with a tiny amount of data compared to what is routine now.

loading story #42063083