Kimi K2.7-Code: open-source coding model with better token efficiency
https://huggingface.co/moonshotai/Kimi-K2.7-CodeGPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference
It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though
I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.
But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.
Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.
My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?
For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).
Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).
For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.
If you look at a file like:
https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...
you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.
4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.
There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.
Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.
MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.
I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.
(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)
For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.
But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)
--
[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)
When I tried glm found it way way slower (omlx as runtime)
Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.
Just in case there are those who'd reflexively down vote this post, I'd just like to say that in a time of great national geopolitical rivalries, this kind of question is not unreasonable one to ask. Indeed, its applicable question whichever nation you live in.
UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.
- GPT-5.5: 62.7%
- Opus 4.8: 62.2%
- Kimi K2.7 Code: 56.3%
- Kimi K2.6: 48.2%