Hacker News new | past | comments | ask | show | jobs | submit

Kimi K2.7-Code: open-source coding model with better token efficiency

https://huggingface.co/moonshotai/Kimi-K2.7-Code
Reading their modified license terms, it cracks me up, because they've basically remade the MIT to be the MIT + the one clause that the BSD used to have, which didn't care about MAU or revenue, if you used it in a product, they asked you to 'advertise' them basically. Honestly, its a reasonable request.
loading story #48505958
This is the cursor callout.

Don't make us shame you into disclosure

Cursor had a specific licensing agreement that allowed them to brand it how they want.
loading story #48504120
loading story #48503763
loading story #48504792
Personally, when I use open code or routers, I feel that beyond a certain level, the models don't make a huge difference to me. Except for expensive and mediocre models like Gemini. In that sense, Chinese models are pretty good. I usually write code in function or method units and then design and assemble them together.

GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference

loading story #48504789
loading story #48503650
loading story #48504813
loading story #48503841
I would really love to know if anyone has any experience with something like opencode + Kimi K2.6/2.7 now compared to Claude Code. What is better, what is worse, what is the cost comparison. I am currently paying $100 for the 5x Max plan, but Fable is running through the usage limits quite drastically and I cannot really say it's night and day compared to Opus. Also, I use this mostly for my side projects, so the $100 bill is quite noticeable. I definitely don't want to pay more.
loading story #48503577
loading story #48503365
loading story #48503533
I can only talk about GLM 5.1 which is roughly at sonnet 4 levels imo.

It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though

loading story #48503353
loading story #48503724
loading story #48503192
I think there is some threshold after which "best" model doesn't matter, we are not that far from it. Fable now is really good, in a year or so, if Kimi catches up, even if Fable6 is much better, I think I will use kimi at 1/10th of the price.

I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.

But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.

Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.

loading story #48503201
loading story #48503563
I was wondering how does Anthropic and likes keep competitive when Opus is ($5 / $25) 5x times more expensive compared to Kimi K2.6 ($0.7 / $3.4) or other Chinese models, while being only marginally better.

My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?

loading story #48503395
loading story #48503863
loading story #48503601
loading story #48503559
loading story #48503730
loading story #48504806
loading story #48502913
loading story #48503537
loading story #48503195
loading story #48503036
loading story #48506499
I tested it properly and it seems rather decent improvement atleast it does use less tokens for the same task which is good enough a reason for me to use it over k2.6 if I need an open model
I think any new model not demonstrably maybe 20-30% over Deepseek v4 capabilities priced over the price per token of Deepseek is almost automatically deprecated as low use model (maybe for Planning).
loading story #48505469
loading story #48503124
Output tokens are almost 5x more expensive than mimov2.5 pro/dsv4pro. I’m curious to see if Kimik2.7 is that much better. Feels like kimi are positioning themselves as the premium open source models
I am still very new to the open-weight/source models. If anyone is using them full-time, I’d really love to hear about the setup and how they perform, as I am considering moving my org off Anthropic products.
Anecdotal, but here's my experience.

For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).

Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).

For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.

I'm making DSLs a lot as an architecture pattern also. I'd be curious to know what stack you're using this and how you're approaching it
These models have open weights, but at the moment most flagship models are practically accessible only through third-party model providers. The main exception is models in the ~30B parameter range, which can still be run on consumer-grade GPUs. That said, even consumer GPUs have become increasingly expensive and difficult to justify in recent years.
You can definitely go above 30B on consumer hardware – 2x gpus, spark, mac, half byte quants etc.
I created this and I would say glm-4.7 accounts for 80% of the code in https://github.com/gitsense/gsc-cli

If you look at a file like:

https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...

you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.

4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.

There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.

Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.

MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.

I keep trying to switch to the Chinese models, but I keep finding myself asking Claude to fix their outputs. (Both functionality and style.) So I always end up switching back.[0]

I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.

(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)

For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.

But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)

--

[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)

I use glm5.1 plus pi with a few customized skills and am very happy with it. I hadn’t touched my Claude 5x plan for a couple of weeks but opened it back up in Claude code when fable was released and did a few tasks and still was happy to return to glm/pi.
Better than Qwen3.6-35B-A3B-8bit ?

When I tried glm found it way way slower (omlx as runtime)

I have been using deepseek v4 flash as my main model for everything ever since dwarf star came out. I run it on my M4 Max MacBook Pro with 128gb of memory. I run it usually as a server and connect to it over tailscale with my coding machine and use the Pi coding agent. It’s a big leap over using the Qwen models though it doesn’t have vision - so I still will run those when I use vision. GLM 4.7 flash was my previous go to for coding but I’ve completely switched to deepseek for all non-vision things.
Qwen 3.6 seems to be the strongest local models, works OK on an RTX 5090 or a > 32GB Mac.
I used glm5/5.1 for 60 days. Certainly better than Sonnet 4.6, not as good as Opus or GPT.

Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.

Has anyone taken these open weight models from China and stripped the CCP out of them? I do not mean that snarkily, I mean review them thoroughly using techniques for weight introspection (concept activations) in response to things that one might expect would trigger deceptive/malicious behavior if the CCP had actually tried to implant context-specific behaviors (e.g. the accusation of generating vulnerable code if being used in American government applications, which I don't know if it was ever proven).

Just in case there are those who'd reflexively down vote this post, I'd just like to say that in a time of great national geopolitical rivalries, this kind of question is not unreasonable one to ask. Indeed, its applicable question whichever nation you live in.

loading story #48506005
loading story #48506003
loading story #48505988
In OpenRouter, there is an "int4" tag for Moonshot provider of Kimi K2. 7 Code. Isn't that too low, particularly coming from the very developer of the model? Os that a mistake? How is it in their direct API offer?
loading story #48505461
Great! Finally follows custom tool call format (k2.6 couldn't). It's a good indicator of instructions following and agentic behaviour.

UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.

I think deepseek has crossed the threshold for being on par with opus 4.6 and kimi is doing a great job in shipping velocity.
loading story #48504462
Benchmark geometric mean

- GPT-5.5: 62.7%

- Opus 4.8: 62.2%

- Kimi K2.7 Code: 56.3%

- Kimi K2.6: 48.2%

loading story #48504291
{"deleted":true,"id":48504011,"parent":48502347,"time":1781272030,"type":"comment"}
This maps to what I'm seeing in practice. The gap between demo and production is consistently underestimated, especially around error handling and edge cases.
How is 2.7 a thing _now_ ? it's not even mentioned on moonshot's webpage..
loading story #48503351