The math isn't that difficult. The transformers paper (https://proceedings.neurips.cc/paper_files/paper/2017/file/3...) was remarkably readable for such a high impact paper. Beyond the AI/ML specific terminology (attention) that were thrown out
Neural networks are basically just linear algebra (i.e matrix multiplication) plus an activation function (ReLu, sigmoid, etc.) to generate non-linearities.
Thats first year undergrad in most engineering programs - a fair amount even took it in high school.
loading story #43126074
loading story #43129287