Story Detail of id 47395592 | Liveview Hacker News

libraryofbabel14 hours ago | on: LLM Architecture Gallery

Thanks for the note about Qwen3.5. I should keep up with this more. If only it were more relevant to my day to day work with LLMs!

I did consider MoEs but decided (pretty arbitrarily) that I wasn’t going to count them as a truly fundamental change. But I agree, they’re pretty important. There’s also RoPE too, perhaps slightly less of a big deal but still a big difference from the earlier models. And of course lots of brilliant inference tricks like speculative decoding that have helped make big models more usable.

#visit	13,138,128
#session	74,665
#live-session	0