MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second

https://mimo.xiaomi.com/blog/mimo-tilert-1000tps

loading story #48447994

loading story #48447479

These price and speed optimization from Chinese providers, combined with the raising prices from American ones will change the game sooner than later. Many companies are finding issues with the AI bills already.

loading story #48447397

loading story #48447333

loading story #48447704

kingstnap1 hour ago | parent | next

Given that MiMo is as cheap as Deepseek ( previous discussion: https://news.ycombinator.com/item?id=48282814 ) multiplying that by 3x for ultra speed is still shockingly cheap.

serpix1 hour ago | parent | next

I may sound like a shill, but exponential growth and all. We are going to get near instant software from prompt, multiple ones and then choose the best one.

Discussions about choosing a library with the best syntactic sugar method naming is just as crazy as suggesting we type in assembly.

loading story #48447279

loading story #48447361

loading story #48447285

loading story #48447318

loading story #48447535

loading story #48447376

scosman1 hour ago | parent | next

Cerebras is trialing Kimi K2.6 at 3000t/s (invite only). I'm excited for when the fast hardware gets more mainstream for frontier models. Models designed for speed on Nvidia are nice addition that could bridge the gap.

maxloh1 hour ago | parent | next

The generation speed in the demo video is crazy, to say the least, and completely beyond my impressions of LLMs.

The Xiaomi team really brought something to the table.

irthomasthomas1 hour ago | parent | next

I don't understand, given all they say, why this would not be made available to everyone at once? Why the limited release? They should have no trouble scaling it if it runs on a single rack.

loading story #48447385

loading story #48447233

minraws1 hour ago | parent | next

Assuming they mean 8xA100 or similar, that's some rather insane performance, and at just 3x the cost, it still quite cheap-ish. With some optimisations this might be quite interesting.

I think the margins are getting quite compressed with this one, since it isn't included in token plan and the actual costs increase are much higher than just 3x. But still fairly decent.

loading story #48447160

loading story #48447792

loading story #48447835

loading story #48447648

npn1 hour ago | parent | next

How?

loading story #48447277

loading story #48447495

__natty__1 hour ago | parent | next

With this at 1k tps and Kimi 2.6 1k tps by Cerebras, I believe we are entering the next stage of LLMs, where companies will also compete on throughput

loading story #48447520

loading story #48447867

elar_verole1 hour ago | parent | next

Yeah, this seems to be the easiest path for overall agents efficiency in the short term

moffkalast1 hour ago | parent | next

42B active params, sliding window attention. There's your tradeoff.

loading story #48447139

loading story #48447312

loading story #48447309

1 hour ago | parent | next

{"deleted":true,"id":48447039,"parent":48446639,"time":1780934249,"type":"comment"}

holoduke1 hour ago | parent | next

Speed is indeed a next big thing what should happen with LLM frontier models. The possibilities with current models but 1000 times faster would be super useful. Earlier this week it took Claude at least full time a week with two max subscriptions to solve a complex issue where we wanted to mimic a occlusion mapping variant used in the game Crimson Desert. Pretty complex mathematical challenge. With a ultra fast LLM and a proper self verification process it would be awesome.

GaggiX1 hour ago | parent | next

If MiMo v2.5 Pro can run at >1000tk/s on GPUs then I will soon expect the same from OpenAI/Anthropic/Google.

slopinthebag1 hour ago | parent | next

I hope this is the next frontier AI labs push. Even the open models are smart enough, and they’re cheap enough, now if they can be fast enough they can make certain workflows possible and allow us to remain in flow state while we use them.

m00dy1 hour ago | parent | next

boom!

maxothex1 hour ago | parent | next

[flagged]

FastAnchor1 hour ago | parent | next

[dead]

atemerev1 hour ago | parent

I test all Chinese models with "What happened on Tiananmen Square at June 4th, 1989?" prompt. MiMo-2.5-Pro so far passes the test (explains the event correctly), both on DeepInfra and Xiaomi providers. So not bad.

loading story #48447201

loading story #48447353

loading story #48447287

loading story #48447186

loading story #48447196

loading story #48447141

#visit	13,657,732
#session	74,665
#live-session	0