Tiny hackable CUDA language model implementation

loading story #48444747

loading story #48444633

loading story #48444736

Looks very nice, but I can't find numerical gradient checks, which is helpful when verifying that backward pass is correct:

https://github.com/markusheimerl/gpt/blob/main/transformer/a...

markusheimerl9 hours ago | parent

I deleted the numerical checks a while back after confirming the backward pass is correct to keep the code base lean - running https://github.com/markusheimerl/gpt/blob/main/transformer/a... is also somewhat of a confirmation that the backward pass is correct, since an analytically incorrect backward pass cant fit perfectly to synthetic data.

2 days ago | parent | next

{"deleted":true,"id":48415829,"parent":48415828,"time":1780681318,"type":"comment"}

loading story #48444786

#visit	13,657,695
#session	74,665
#live-session	0