Story Detail of id 47396892 | Liveview Hacker News

phanarch11 hours ago | on: LLM Architecture Gallery

I'd push back slightly on the "no fundamental innovations" read though — the innovations that stuck (MoE, GQA, RoPE) are almost entirely ones that improve GPU utilization: better KV-cache efficiency, more parallelism in attention, cheaper to serve per parameter. Mamba and SSM-based hybrids are interesting but kept running into hardwar friction.

#visit	13,138,418
#session	74,665
#live-session	0