Story Detail of id 48409477 | Liveview Hacker News

>It's always the trade off of a smart complex operation against an absolute crapload of dumb ones.

You can't make attention more specialized without making it less general, which makes LLMs worse as a universal approximator.