> It uses 384 routed experts (top-8) with hybrid attention (full-attention + sliding-window 128 at 6:1 ratio) over 70 layers (1 dense + 69 MoE)
https://recipes.vllm.ai/XiaomiMiMo/MiMo-V2.5-Pro