Story Detail of id 48241702 | Liveview Hacker News

michaelbuckbee23 hours ago | on: Microsoft starts canceling Claude Code licenses

I was trying to get a better sense of the time cost quality matrix of these, so I threw together a quick eval of Sonnet 4.6, Mistral's dev model, and Opus 4.7 (figuring it's what you'd use if you were on Max).

The results for a function implementation and test of levenshtein distance in js are pretty similar but Mistral is 30x cheaper than Opus 4.7 and 4x faster than Sonnet 4.6.

https://5m6qnuhyde.evvl.io/

kaoD13 hours ago | parent | next

But that's not very informative.

Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.

My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.

There's where even frontier models struggle, which makes comparisons meaningful.

loading story #48245875

loading story #48247206

KronisLV22 hours ago | parent

The one detail I did forget to mention is that if anyone goes with the Mistral subscription (instead of paying per-token), then the Mistral Vibe tool gives you their Medium 3.5 model by default, with a 200k token context. It will probably be enough for plenty of tasks, though there's also a noticeable difference between that and up to 1M.

#visit	13,335,369
#session	74,665
#live-session	0