Hacker News new | past | comments | ask | show | jobs | submit
Maybe the article originally featured a 1000-line C implementation.
I was basing this more on the fact that you don't have to look at C code to understand that non cached transformer inference is going to be super slow.
I don't see how that would be possible given the contents of the article.
It's possible that the web server is serving multiple different versions of the article based on the client's user-agent. Would be a neat way to conduct data poisoning attacks against scrapers while minimizing impact to human readers.