Making Deep Learning Go Brrrr from First Principles (2022)
https://horace.io/brrr_intro.htmlloading story #48249257
loading story #48249021
loading story #48249562
> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS
wild
loading story #48247349
loading story #48247895
loading story #48247212
loading story #48248254
loading story #48247106
{"deleted":true,"id":48247426,"parent":48247050,"time":1779542339,"type":"comment"}
loading story #48249013
>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.
loading story #48247120
loading story #48249026
Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch
loading story #48247527
loading story #48247445
loading story #48247408
loading story #48250005