Hacker News new | past | comments | ask | show | jobs | submit
>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.
Does this mean that if your model is "overfitting", the solution is to train for even more epochs?
Right, isn't double descent one of the reasons why modern Extremely Large Language Models work at all? I think I heard somewhere that basically all today's "smart" (reasoning, solving math problems, etc) LLMs are trained in the "double descent" territory (whatever this means, I'm not entirely sure).
loading story #48248956
loading story #48247884