Hacker News new | past | comments | ask | show | jobs | submit
Anyone (including ANTHROP\C) "recommending selective use of cheaper models" is spending costly human time (which costs more over time) on correcting the machine (which costs less over time). This is a bad trade.

In cost per line of code, we have verified this is always an error unless your time is worth less than the machine (unlikely unless you consider your time to have no cost rather than considering it as your hourly rate).

The worst thing for our productivity has been Claude Code or Claude Cowork taking a complex problem and turning around and writing bad instructions for dumb model agents then synthesizing the dumb answers into an orchestra of badness.

The single best fix for results-per-total-cost is to ensure it reads and thinks about whole content, not snippets, and thinks with the smartest model, not agents.

Agents should toil. Agents should neither think*, nor decide what to think about which itself is thinking.

* Agents should “think” like ants or bees or beavers think. Any human-like thinking, *especially* intuition-like thinking, should be thought by the best model available.

** Nobody should be “churning out code”. In a hierarchy of coders who translate detailed specs to some computer language, developers who write software that ships on a project timeline, and engineers who accomplish business goals, engineers should “churn out” engines structured for business outcomes.

Measured by that, the machine is leverage while reducing a variety of costs. At the same time, because most training data doesn't grok this, the machine doesn't grok it either. So it needs you to shape its toil.

I disagree heartily with everything here, both in personal experience from the models, and in values about coding.

I don't care bout cost, I care about getting good results fast.

Cost per line of code is not a suitable metric for anything. It's as silly as measuring engineers' performance by lines of code. More lines of code is worse than fewer lines of code. When you say "we have verified" whoever that "we" is makes a big difference, but you're posting pseudonymously, how are we to even guess at that "we"?

I get better results with some older cheaper models, faster. In particular older Claude models than Opus 4.7. Maybe the more expensive model churns out more lines, more complexity faster. That is a worse outcome for me. The complexity must be avoided at all costs. The simpler, smaller, answer is always better, and scales to bigger code bases. The more the model guesses at intent rather than checking intent, the more the model is clever rather than clear and simple, the worse the outcome, the more that the model turns into an architecture astronaut, the worse the outcome.

loading story #48248702
Too many people see wages as a sunk cost and a constant. One problem though is AI costs per task are unpredictable, and management tends to prefer predictable outcomes over optimal outcomes.
> The single best fix for results-per-total-cost is to ensure it reads and thinks about whole content, not snippets, and thinks with the smartest model, not agents.

I haven't seen "just absorb a giant ball of context and do the right thing the first time" be cracked yet, even for Opus 4.7.

At the end of the day, code is code, and we have decades of lessons about how to make code more reliable and maintainable. Composable small modules, not god methods, are still the way to go, and they reward devs who use them to get focused context for agents with faster - and often better - results.