Story Detail of id 48247148 | Liveview Hacker News

noosphr9 hours ago | on: Making Deep Learning Go Brrrr from First Principles (2022)

>We show that a variety of modern deep learning tasks exhibit a "double-descent" phenomenon where, as we increase model size, performance first gets worse and then gets better.

smallerize7 hours ago | parent | next

Does this mean that if your model is "overfitting", the solution is to train for even more epochs?

ForceBru8 hours ago | parent

Right, isn't double descent one of the reasons why modern Extremely Large Language Models work at all? I think I heard somewhere that basically all today's "smart" (reasoning, solving math problems, etc) LLMs are trained in the "double descent" territory (whatever this means, I'm not entirely sure).

loading story #48248956

loading story #48247884

#visit	13,338,211
#session	74,665
#live-session	0