Story Detail of id 48504169 | Liveview Hacker News

WarmWash8 hours ago | on: Kimi K2.7-Code: open-source coding model with better token efficiency

Somehow the internet has also forgot that cheating to get ahead in China is basically a norm and expected behavior.

American labs also use gamed and cherry-picked benchmarks extensively. Anthropic used them in their Fable announcement and avoided DeepSWE because it doesn't beat GPT-5.5 in that one. Google's numbers for Gemini 3.5 Flash recently did not at all line up with people's subjective experience using these models, and this also happened with Gemini 3.1 Pro before it.

Everybody has incentives to manipulate benchmark results to show their models in the best light.

#visit	13,793,302
#session	74,665
#live-session	0