Tiny hackable CUDA language model implementation
https://github.com/markusheimerl/gptloading story #48444747
loading story #48444633
loading story #48444736
Looks very nice, but I can't find numerical gradient checks, which is helpful when verifying that backward pass is correct:
https://github.com/markusheimerl/gpt/blob/main/transformer/a...
I deleted the numerical checks a while back after confirming the backward pass is correct to keep the code base lean - running https://github.com/markusheimerl/gpt/blob/main/transformer/a... is also somewhat of a confirmation that the backward pass is correct, since an analytically incorrect backward pass cant fit perfectly to synthetic data.
{"deleted":true,"id":48415829,"parent":48415828,"time":1780681318,"type":"comment"}
loading story #48444786