Story Detail of id 48245625 | Liveview Hacker News

kaoD13 hours ago | on: Microsoft starts canceling Claude Code licenses

But that's not very informative.

Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.

My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.

There's where even frontier models struggle, which makes comparisons meaningful.

CraigJPerry13 hours ago | parent | next

>> many small decisions

It’s making guesses not decisions, framing as decisions will lead you astray to wasted time and tokens.

It’s vaguely productive to tell them a ton of relevant info upfront attempting to minimise their need for load bearing guesses. I say vaguely because obedience is generally only around the level where it's good enough to lull you into a false sense of security, not to actually be obedient.

It’s a bit more productive to use the various loop mechanisms (hooks, /goal etc) to evaluate each end of turn against guard rails and reject with clear instruction on whats unacceptable. Obviously if you only do this without the front load of info then you’re likely to spend more tokens to reach a satisfactory end of iteration.

kaoD12 hours ago | root | parent

If I perfectly know all the guardrails I need, I don't need an LLM, only Prolog.

mark_l_watson8 hours ago | parent

While you are correct that something like Antigravity 2 + Opus 4.6 can handle large scale software engineering tasks, I would argue that it is usually (but not always) better "coding agent hygiene" to work on smaller code modules and as the human in the loop be a partner, not someone who prompts and then disengages.

Breaking code up into composable chunks has worked well for me over 50+ years as a professional software developer, and I can't get away from the idea that it is still usually the way to go using agentic coding tools.

#visit	13,336,765
#session	74,665
#live-session	0