Stupid question: Is there any chance that I, as an engineer, can get away from learning the Math side of AI but still drill deeper into the lower level of CUDA or even GPU architecture? If so, how do I start? I guess I should learn about optimization and why we chose to use GPU for certain computations.
Parallel question: I work as a Data Engineer and always wonder if it's possible to get into MLE or AI Data Engineering without knowing AI/ML. I thought I only need to know what the data looks like, but so far I see every job description of an MLE requires background in AI.
loading story #43122345
The math isn't that difficult. The transformers paper (https://proceedings.neurips.cc/paper_files/paper/2017/file/3...) was remarkably readable for such a high impact paper. Beyond the AI/ML specific terminology (attention) that were thrown out
Neural networks are basically just linear algebra (i.e matrix multiplication) plus an activation function (ReLu, sigmoid, etc.) to generate non-linearities.
Thats first year undergrad in most engineering programs - a fair amount even took it in high school.
I'd like to re-enforce this viewpoint. The math is non-trivial, but if you're a software engineer, you have the skills required to learn _enough_ of it to be useful in the domain. It's a subject which demands an enormous amount of rote learning - exactly the same as software engineering.
hot take: i don't think you even need to understand much linear algebra/calculus to understand what a transformer does. like the math for that could probably be learned within a week of focused effort.
loading story #43129695
loading story #43129436
loading story #43128442
loading story #43130123
loading story #43124461
loading story #43129538
loading story #43126744
loading story #43122355
loading story #43124586
loading story #43125976
loading story #43126642
loading story #43123430
loading story #43128424
loading story #43123642
loading story #43125070