I'd push back slightly on the "no fundamental innovations" read though — the innovations that stuck (MoE, GQA, RoPE) are almost entirely ones that improve GPU utilization: better KV-cache efficiency, more parallelism in attention, cheaper to serve per parameter. Mamba and SSM-based hybrids are interesting but kept running into hardwar friction.