Kimi K2.7-Code: open-source coding model with better token efficiency

https://huggingface.co/moonshotai/Kimi-K2.7-Code

303nekofneko | 7 hours ago | 155 | HN

giancarlostoro4 hours ago | parent | next

Reading their modified license terms, it cracks me up, because they've basically remade the MIT to be the MIT + the one clause that the BSD used to have, which didn't care about MAU or revenue, if you used it in a product, they asked you to 'advertise' them basically. Honestly, its a reasonable request.

loading story #48505958

htrp4 hours ago | parent

This is the cursor callout.

Don't make us shame you into disclosure

maherbeg2 hours ago | root | parent | next

Cursor had a specific licensing agreement that allowed them to brand it how they want.

loading story #48504120

loading story #48503763

loading story #48504792

jdw644 hours ago | parent | next

Personally, when I use open code or routers, I feel that beyond a certain level, the models don't make a huge difference to me. Except for expensive and mediocre models like Gemini. In that sense, Chinese models are pretty good. I usually write code in function or method units and then design and assemble them together.

GPT series models are more thorough and better, but I'm not sure if the difference is enormous. It seems to depend on the workflow, but in my opinion, if you are thorough enough, I wonder if there really is a big difference

loading story #48504789

loading story #48503650

loading story #48504813

loading story #48503841

shreedx5 hours ago | parent | next

I would really love to know if anyone has any experience with something like opencode + Kimi K2.6/2.7 now compared to Claude Code. What is better, what is worse, what is the cost comparison. I am currently paying $100 for the 5x Max plan, but Fable is running through the usage limits quite drastically and I cannot really say it's night and day compared to Opus. Also, I use this mostly for my side projects, so the $100 bill is quite noticeable. I definitely don't want to pay more.

loading story #48503577

loading story #48503365

loading story #48503533

ramon1565 hours ago | parent | next

I can only talk about GLM 5.1 which is roughly at sonnet 4 levels imo.

It's good, does most tasks well that I throw at it, but will fail at anything congitive/complex. It gets stuck often. It costs ~6$ a month though

loading story #48503353

loading story #48503724

loading story #48503192

jackdoe5 hours ago | parent | next

I think there is some threshold after which "best" model doesn't matter, we are not that far from it. Fable now is really good, in a year or so, if Kimi catches up, even if Fable6 is much better, I think I will use kimi at 1/10th of the price.

I said that about opus 4.5 at the time, thinking "this is so good, in 6-12 months the Chinese models will be as good and cheap, I will use them", but I was wrong.. I pay premium for opus4.7/8 and Fable.

But at some point, it will just do the thing you want it to do, and then the race to the bottom will start.

Now that Chinese companies have access to some very good Fable tokens, I hope it speeds up the race.

loading story #48503201

loading story #48503563

yanis_t6 hours ago | parent | next

I was wondering how does Anthropic and likes keep competitive when Opus is ($5 / $25) 5x times more expensive compared to Kimi K2.6 ($0.7 / $3.4) or other Chinese models, while being only marginally better.

My theory is that US enterprise just can't send data to Chinese and that's understandable, but is that "the moat"?

loading story #48503395

loading story #48503863

loading story #48503601

loading story #48503559

loading story #48503730

loading story #48504806

loading story #48502913

loading story #48503537

loading story #48503195

loading story #48503036

loading story #48506499

minraws3 hours ago | parent | next

I tested it properly and it seems rather decent improvement atleast it does use less tokens for the same task which is good enough a reason for me to use it over k2.6 if I need an open model

343rwerfd6 hours ago | parent | next

I think any new model not demonstrably maybe 20-30% over Deepseek v4 capabilities priced over the price per token of Deepseek is almost automatically deprecated as low use model (maybe for Planning).

loading story #48505469

loading story #48503124

Bnjoroge2 hours ago | parent | next

Output tokens are almost 5x more expensive than mimov2.5 pro/dsv4pro. I’m curious to see if Kimik2.7 is that much better. Feels like kimi are positioning themselves as the premium open source models

bgins6 hours ago | parent | next

I am still very new to the open-weight/source models. If anyone is using them full-time, I’d really love to hear about the setup and how they perform, as I am considering moving my org off Anthropic products.

marcyb5st4 hours ago | parent | next

Anecdotal, but here's my experience.

For personal stuff I use forgecode with openrouter. Firstly, forgecode is a much better harness than Cloude code (IMHO).

Anyway, regarding the models, my experience is that there is not much difference in terms of quality, but the cost difference is insane. At least for how I use agents. Yesterday's example is the following: I am developing a small DSL for search across complex technical documents. I wanted to add a small operator to it and thought that to give fable a spin. It burned through 13 USD and while it delivered the solution it wasn't objectively better than what Deepseek v4 did for 1.7 dollars (same exact task because I was curious).

For full disclosure, I ask agents for piecemeal stuff. Like in the DSL case, I designed the operators and then asked agents to implement them one by one. Probably if I asked to design the whole thing starting from these complex documents Fable would shine, but every time I try to give agents broader scope tasks they burn through millions of tokens, generate questionable code, which I have to spend time familiarize myself with.

sroerick2 hours ago | root | parent

I'm making DSLs a lot as an architecture pattern also. I'd be curious to know what stack you're using this and how you're approaching it

DragonBooster5 hours ago | parent | next

These models have open weights, but at the moment most flagship models are practically accessible only through third-party model providers. The main exception is models in the ~30B parameter range, which can still be run on consumer-grade GPUs. That said, even consumer GPUs have become increasingly expensive and difficult to justify in recent years.

mirekrusin5 hours ago | root | parent

You can definitely go above 30B on consumer hardware – 2x gpus, spark, mac, half byte quants etc.

sdesol4 hours ago | parent | next

I created this and I would say glm-4.7 accounts for 80% of the code in https://github.com/gitsense/gsc-cli

If you look at a file like:

https://github.com/gitsense/gsc-cli/blob/main/internal/cli/r...

you can see that I attribute the models used. What I found was 4.7 was not very good at `go` code which was why you started to see `Gemini 3 Flash` in the attributions.

4.7 is what Cerebras provide and for me, speed in iterations is a lot more important. Having played around with MiMo v2.5.0-Pro, I am 100% sure it could have done what Gemini 3 Flash did.

There were a few points where I was stuck and needed Sonnet to explain things to me, but I think the dirty secret that Anthropic and OpenAI won't tell you is, if you know how to code, the models are honestly good enough.

Based on my experience with MiMo and what others are saying about GLM 5.1, we are now in a hardware race. The Chinese Models are 100% drop in replacement for Claude if you know how to program but want to AI to help amplify what you know. What I will consider now is what provider can provide the fastest inference.

MiMo-v2.5.0-Pro-Ultraspeed is really good at generating good results quickly and burning your money as fast.

andai5 hours ago | parent | next

I keep trying to switch to the Chinese models, but I keep finding myself asking Claude to fix their outputs. (Both functionality and style.) So I always end up switching back.[0]

I also keep trying GPT, which is quite solid. Very fast, great at debugging. But its code is often overly clever and hurts my brain.

(Maybe fixable with prompting. I tried and it helped the Chinese ones a bit. Just tell them do be elegant, like in the old image AI days "+good -bad"!)

For now I do still need my human brain to actually be able to make sense of the stuff, and Claude is the only one that consistently meets that requirement.

But I am hoping that one of these days, one of the Chinese labs figures out the special sauce :)

[0] (For smallish edits, though, I am having a great time with DeepSeek Flash. Practically unlimited AI on tap! How cool is that.)

scottcha5 hours ago | parent | next

I use glm5.1 plus pi with a few customized skills and am very happy with it. I hadn’t touched my Claude 5x plan for a couple of weeks but opened it back up in Claude code when fable was released and did a few tasks and still was happy to return to glm/pi.

sebastianconcpt5 hours ago | root | parent

Better than Qwen3.6-35B-A3B-8bit ?

When I tried glm found it way way slower (omlx as runtime)

kamranjon4 hours ago | parent | next

I have been using deepseek v4 flash as my main model for everything ever since dwarf star came out. I run it on my M4 Max MacBook Pro with 128gb of memory. I run it usually as a server and connect to it over tailscale with my coding machine and use the Pi coding agent. It’s a big leap over using the Qwen models though it doesn’t have vision - so I still will run those when I use vision. GLM 4.7 flash was my previous go to for coding but I’ve completely switched to deepseek for all non-vision things.

trollbridge4 hours ago | parent | next

Qwen 3.6 seems to be the strongest local models, works OK on an RTX 5090 or a > 32GB Mac.

polski-g3 hours ago | parent

I used glm5/5.1 for 60 days. Certainly better than Sonnet 4.6, not as good as Opus or GPT.

Use DCP or Magic Context plugin in OpenCode to keep the context below 160k and you're fine.

SubiculumCode1 hour ago | parent | next

Has anyone taken these open weight models from China and stripped the CCP out of them? I do not mean that snarkily, I mean review them thoroughly using techniques for weight introspection (concept activations) in response to things that one might expect would trigger deceptive/malicious behavior if the CCP had actually tried to implant context-specific behaviors (e.g. the accusation of generating vulnerable code if being used in American government applications, which I don't know if it was ever proven).

Just in case there are those who'd reflexively down vote this post, I'd just like to say that in a time of great national geopolitical rivalries, this kind of question is not unreasonable one to ask. Indeed, its applicable question whichever nation you live in.

loading story #48506005

loading story #48506003

loading story #48505988

theanonymousone2 hours ago | parent | next

In OpenRouter, there is an "int4" tag for Moonshot provider of Kimi K2. 7 Code. Isn't that too low, particularly coming from the very developer of the model? Os that a mistake? How is it in their direct API offer?

loading story #48505461

pcwelder3 hours ago | parent | next

Great! Finally follows custom tool call format (k2.6 couldn't). It's a good indicator of instructions following and agentic behaviour.

UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.

RIshabh2354 hours ago | parent | next

I think deepseek has crossed the threshold for being on par with opus 4.6 and kimi is doing a great job in shipping velocity.

loading story #48504462

goldenarm4 hours ago | parent | next

Benchmark geometric mean

- GPT-5.5: 62.7%

- Opus 4.8: 62.2%

- Kimi K2.7 Code: 56.3%

- Kimi K2.6: 48.2%

loading story #48504291

4 hours ago | parent | next

{"deleted":true,"id":48504011,"parent":48502347,"time":1781272030,"type":"comment"}

jkwang5 hours ago | parent | next

This maps to what I'm seeing in practice. The gap between demo and production is consistently underestimated, especially around error handling and edge cases.

fractalf5 hours ago | parent | next

How is 2.7 a thing _now_ ? it's not even mentioned on moonshot's webpage..

loading story #48503351

RobertPelloni4 hours ago | parent | next

insanely great!

jingpostmedia4 hours ago | parent | next

[flagged]

jingpostmedia4 hours ago | parent | next

[flagged]

haeseong5 hours ago | parent

[flagged]

#visit	13,783,053
#session	74,665
#live-session	0