Ultrafast machine learning on FPGAs via Kolmogorov-Arnold Networks
https://aarushgupta.io/posts/kan-fpga/So for people wondering if it can be used to accelerate LLM inference, sadly not.
I've been trying to hit 100,000tokens/s with a 3.28m dumb model, and even this is an order of magnitude too large to benefit.
It appears to be focussed more on latency, than throughput. Happy to be corrected?
loading story #48467888
Right. But ... this would limit you to either extremely small models or extremely large FPGA's, yes? If there's a simple machine learning task that requires a sub microsecond latency I can see the point but otherwise??
Yes, this work is focused on accelerating very small models, typically for real-time systems that require extremely low power or low latency.
One primary application of this work is in high-energy physics (https://home.cern/smarter-decisions-at-the-speed-of-collisio...). Ultrafast and real-time learning is also very applicable for problems in quantum computing, plasma control, etc. (https://arxiv.org/pdf/2602.02005).
{"deleted":true,"id":48467678,"parent":48466966,"time":1781038761,"type":"comment"}
Happy to hear that KANs continue to find solid footing.
This guy will be hired by a high-frequency trading firm, and the next time we hear about him, he will have a net worth in 9 figures.
loading story #48467152
{"deleted":true,"id":48466278,"parent":48466277,"time":1781032905,"type":"comment"}
loading story #48469075
Archive link, as it looks like the original post was taken down: https://web.archive.org/web/20260609200156/https://aarushgup...
loading story #48467220
[dead]
loading story #48468934
loading story #48468623