Sliding window for the draft model, not for the main. 42B for active params because it’s a sparse MoE which is a common technique for the larger models to not get bottlenecked by memory bandwidth.
Seems to be for both according to the spec [0], maybe it's wrong though.
128 sounds really tiny, I wonder if they mean some kind of blocks?
[0] https://huggingface.co/XiaomiMiMo/MiMo-V2.5-Pro-FP4-DFlash#4...
loading story #48447744