Story Detail of id 48314035 | Liveview Hacker News

manmal22 hours ago | on: Claude Opus 4.8

But how? The training data is the unadulterated content those models are based on? I genuinely don’t understand, no snark.

wtallis18 hours ago | parent

Raw training data is raw. A really big model trained on it has already done a first-pass of finding patterns and squeezing out redundancy. Re-ingesting the full training set to train a smaller model is probably more expensive, for marginal quality improvement over distilling from the large model.

adgjlsfhk116 hours ago | root | parent

Distilling from a larger model is not only probably cheaper than from data, it's also likely higher quality. There's pretty strong support for the proposition that NNs learn a smoothed and regularized version of the data. The NNs are likely higher quality than most of the data they are training from.

#visit	13,439,295
#session	74,665
#live-session	0