Hacker News new | past | comments | ask | show | jobs | submit
Sliding window for the draft model, not for the main. 42B for active params because it’s a sparse MoE which is a common technique for the larger models to not get bottlenecked by memory bandwidth.
Seems to be for both according to the spec [0], maybe it's wrong though.

128 sounds really tiny, I wonder if they mean some kind of blocks?

[0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash#4...

loading story #48447744