Hacker News new | past | comments | ask | show | jobs | submit
Answering to people arguing against my comment: you guys do not seem to take into account that the technical circumstances were totally different thirty, twenty or even ten years ago! People would have liked to train with more data, and there was a big interest in combining heterogeneous datasets to achieve exactly that. But one major problem was the compute! There weren't any pretrained models that you specialized in one way or the other - you always retrained from scratch. I mean, even today, who's get the capability to train a multibillion GPT from scratch? And not just retraining once a tried and trusted architecture+dataset, no, I mean as a research project trying to optimize your setup towards a certain goal.