Hacker News new | past | comments | ask | show | jobs | submit

Making Deep Learning Go Brrrr from First Principles (2022)

https://horace.io/brrr_intro.html
loading story #48249257
loading story #48249021
loading story #48249562
> in the time that Python can perform a single FLOP, an A100 could have chewed through 9.75 million FLOPS

wild

loading story #48247349
loading story #48247895
loading story #48247212
loading story #48248254
loading story #48247106
{"deleted":true,"id":48247426,"parent":48247050,"time":1779542339,"type":"comment"}
loading story #48249013
>For example, getting good performance on a dataset with deep learning also involves a lot of guesswork. But, if your training loss is way lower than your test loss, you're in the "overfitting" regime, and you're wasting your time if you try to increase the capacity of your model.

https://arxiv.org/abs/1912.02292

loading story #48247120
loading story #48249026
Right now, all I know how to do is pull models from Hugging Face, but someday I want to build my own small LLM from scratch
loading story #48247527
loading story #48247445
loading story #48247408
loading story #48250005