Making Deep Learning Go Brrrr from First Principles (2022)

https://horace.io/brrr_intro.html

117tosh | 7 hours ago | 43 | HN

loading story #48249257

loading story #48249021

loading story #48249562

tosh7 hours ago | parent | next

> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS

wild

loading story #48247349

loading story #48247895

loading story #48247212

loading story #48248254

loading story #48247106

6 hours ago | parent

{"deleted":true,"id":48247426,"parent":48247050,"time":1779542339,"type":"comment"}

loading story #48249013

noosphr7 hours ago | parent | next

>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.

https://arxiv.org/abs/1912.02292

loading story #48247120

loading story #48249026

jdw646 hours ago | parent | next

Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch

loading story #48247527

loading story #48247445

loading story #48247408

loading story #48250005

#visit	13,334,053
#session	74,665
#live-session	0