Hacker News new | past | comments | ask | show | jobs | submit
There's definitely a way to use Claude code that is token conscious.

I've tried throwing unsupervised agentic software factory workflows against the wall, and they burned through my tokens like nobody's business but didn't produce much.

Supervised, human-in-the-loop process on the other hand is much more productive but doesn't consume nearly as much. Maybe that's why everyone's pushing agentic approaches so much.

Yeah. Claude does good work but reviewing it all properly takes quite a bit of time. It got to the point I started having trouble maxing out my weekly allocation.

Dealt with that by going all out and making an agentic parallel code review skill. Basically an infinite TODO list generator. Now I'm definitely getting 100% of the usage I paid for. It really burns tokens like nobody's business, and catches a lot of issues while at it. I've been looping this review/fix process every week. It's dramatically reduced the amount of stuff I need to pay attention to during my human review sessions.

loading story #48246909
I’m interested in how this works in practise - I guess you’ve written a skill to do code review, then your Claude.md file tells it to use it after every change as a bg task? So does this work as a background task while Claude is working on the next ‘feature’?
loading story #48246626
loading story #48246890
The current thinking is automated agents is what turns this from an industry in the tens of billions to a multi trillion dollar one. So yes you are right on the money, agents stimulate demand for this thing they've built.
"The bureaucracy is expanding to meet the needs of the expanding bureaucracy"
AI is expanding to meet the needs of expanding AI. Why worry about jobs? AI will provide plenty of work. If anything, I worry we'll be working more, not less. All that AI will need someone to vouch for it and to scapegoat when it makes mistakes.
I didn't know that one. Loosely said to be Oscar Wilde.
Delivered in the voice of Lenard Lemoy to millions of GenX during their formative years.
Is that a civ 4 reference? You sir, have my upvote.
[flagged]
There is always a quantity of lubricant that can get any machine moving. Just add so much that you create an all consuming river of lube and watch your thing sail away.
Good then that Amazon sells it by the 55 gal drum then.

https://www.amazon.com/Passion-Lubes-Natural-Water-Based-Lub...

> This product is out of stock

Ah, shoot, there go my weekend plans. Bummer.

{"deleted":true,"id":48243517,"parent":48241380,"time":1779498547,"type":"comment"}
I think it's great. People at a broad scale are getting first hand experience with resource management. It's a fairly cheap way of doing it too (in contrast to: learning this by managing humans) and we can all benefit from the skill transfer.
loading story #48246832
At the enterprise level though, its going to be hard to want to use a service in which costs are not predictable, and keeping those costs under control requires employee training.
>...use a service in which costs are not predictable, and keeping those costs under control requires employee training.

Isn't this a (mildly exaggerated) description of AWS, which is a very successful service?

Mmm… but for AWS its pay for external use right?

So your costs scale with the number of users you have.

Thats an op ex that you can explain.

For tokens for developers its maybe closer, cost/outcome wise, to hiring an external consulting company to write your code; money paid scales with work done, no promise of delivery, arbitrary unpredictable external price changes.

Its not quite the same; though, similarly lucrative for consultants.

>Mmm… but for AWS its pay for external use right?

Not if you're using it for running builds, running research jobs, model training, etc.

You can put a limit on token spend and provide training (and even pre-configured workflows) on how to limit token spend.

Like the other commenter said: cloud spend can also spin out of control if you don't pay attention, yet we've found ways to keep it under control (training, guardrails, limits, transparancy).

The problem that I see is what you do if someone runs out of tokens. It doesn't very well work to say "well I guess you just get fired because you can't work at full speed for the rest of the month".

Personally, this feels like its just trying to push the work of managers in allocating resources onto developers so that they have more work to do and can be blamed if anything goes wrong.

Am I losing my mind, aren't there multiple headlines each day about companies penalizing employees for not using AI enough?
That was roughly 3 weeks ago, with the reprising of Claude 4.7 and GPT 5.5, things have become more spicy.
2 months ago: no limits. 1 month ago we had a leaderboard for whoever had the highest token spend not taking into account what was actually produced. This week: “everyone is using opus too much, just use it for planning.”
use AI, don't use AI, this whole thing is getting really hard to follow
loading story #48243376
loading story #48246316
since those headlines started ive felt it just encouraged inefficiency. "say as much as you can without saying anything." if you were accomplishing your task the need for more would end, thus there is incentive to never succeed.
To be fair, the cost of software development has always been fairly unpredictable. What may be different is that the cost used to be roughly proportional to man-hours spent, while now the number of agents running in parallel may be less predictable.
The cost per month is 100% known and always has been. What has been variable is the rate of delivery. AI is different and can be substantial in countries with lower wages.
> To be fair, the cost of software development has always been fairly unpredictable.

Yes, but in a "oops this is gonna take another two months to finish" kind of way, not the "oops this is the 12th time this month 8 developers have burned $2K in tokens in a single day and no one really knows how it happened" kind of way.

We’re all being given belt-loaded machine guns and tossed on to Planet K. We used to pay for the salaries of soldiers, now we have an Ammo Budget.
loading story #48245321
There's no fucking training to mitigate a slot machine.
that analogy is so boring now with so many real world examples of actual LLM work.

people still can't get over the unreasonable effectiveness of algorithms.

There have also been winners of a slot machine gamba, so the analogy quite holds. I would even argue that there are considerably more slot machine gamba winners than the real world examples of actual LLM work.
nondeterminism will always be anathema to the engineering mind
Odd, I train teams (at large companies) to use harnesses effectively. So some training does exist.

I get the anti/skeptic sentiment. I've been called a lot of horrible things by a vocal contingent when they hear that I help train folks to learn software engineering best practices and then apply AI to that.

There’s actually been a ton of research on how to optimize “slot machines,” at least in a generalized sense. For more reading, check out the literature on multi armed bandits.
Games like Diablo are basically a whole bunch of slot machines, and there are strategies you can follow to optimize your run.
Yes, because in video games there is always a chance to win so you can optimize your strategy around that chance. If you have a 1% chance to drop a legendary weapon, the question becomes how do I manufacture 100 chances for a weapon drop in the shortest possible time. With agentic coding there is no such guaranteed chance - in a way it's worse than a slot machine that is guaranteed to pay out eventually. You could spend hundreds of millions of tokens and still not get what you asked for.
You’re right, the arpg analogy isnt great, it’s too simplistic. I was trying to come up with something heavily stochastic where people are coming up with strategies to get the odds in their favor. Maybe closer to speculating on the real estate market? But even that feels too simplistic compared to LLMs. Even the definition of a win isn’t well defined.

Actually it’s really its own thing, I don’t think the slot machine analogy works too well, you also have fixed odds (and you know they aren’t in your favor), and a binary output

loading story #48248869
> If you have a 1% chance to drop a legendary weapon, the question becomes how do I manufacture 100 chances for a weapon drop in the shortest possible time.

Sidenote but I hope everyone realizes that 100 is kind of arbitrary here and does not mean the total chance to to get something is 100%.

loading story #48246451
LOL, that's a sophisticated and sometimes slightly unpredictable multitool.

If this is the "analogy" you go for, you don't seem to be suited to make that comparison.

> There's definitely a way to use Claude code that is token conscious.

Colleague used Sonnet 4.6 on some pretty normal agentic coding tasks through AWS Bedrock to keep the data in the EU, 100 EUR usage in a single day. In comparison, the Mistral subscription costs about 20 EUR per month and we tested that for similar tasks it was okay, the usage got to around 10% of that monthly limit in a single day. Or Anthropic's own Max (5x) plan where you get way, way more tokens to do with as you please.

I feel like the sweet spot is having a monthly subscription with any of the providers (you're subsidized a bunch), but if you have to pay per tokens, now I'd just look in the direction of what tasks DeepSeek would be okay for, sadly probably not in the situation above. For a startup, though...

On the other hand, this feels a bit hypocritical:

> It was part of an effort to get project managers, designers, and other employees to experiment with coding for the first time, and sources tell me that Claude Code has proved very popular inside Microsoft over the past six months.

They're gonna say that the future is all AI... until they get the bill.

I was a Mistral Le Chat Pro subscriber (the €20/month plan). Yesterday I hit my monthly limit. Switching to PAYG I burned through another €40 in one evening, working on the same project, with the same tasks.

I upgraded my plan last night to Mistral Le Chat Teams. This now costs me €60 per month for two users. Limits have been reset, but I have no idea now if my per seat limit is higher than the Pro plan, or if the limit is shared between the seats, it’s really not clear. I guess I will find out next month. The limits reset on the first of the month and I really hope I don’t hit them in the next seven days.

I use Mistral Vibe CLI and I’ve written and implemented a couple of new skills[1]. Caveman, based on an idea I found online somewhere, this skill removes all extraneous response text, including articles. Makes for some fun reading, but supposedly reduces output tokens significantly. Hash-anchors, this one is based on a concept from Dirac[2], reduces search failures and also includes multi-file dispatch. It will be hard to measure, but Vibe tells me these two should result in roughly a 40% reduction in token burn.

[1] https://codeberg.org/MimosaDev/skills

[2] https://dirac.run/

I was trying to get a better sense of the time cost quality matrix of these, so I threw together a quick eval of Sonnet 4.6, Mistral's dev model, and Opus 4.7 (figuring it's what you'd use if you were on Max).

The results for a function implementation and test of levenshtein distance in js are pretty similar but Mistral is 30x cheaper than Opus 4.7 and 4x faster than Sonnet 4.6.

https://5m6qnuhyde.evvl.io/

But that's not very informative.

Levenshtein distance is not only a well-understood problem, it's small, self-contained, and extremely well-represented in the training data. The kind of problem where even small/bad models can excel. The golden standard for those tasks is just "use a library" so no wonder the beefy models are expensive: you're chartering a commercial airplane to go grocery shopping.

My personal benchmarks are software engineering tasks (ideally spanning multiple packages in a monorepo) composed of many small decisions that, compounded, make or break the implementation and long-term maintainability.

There's where even frontier models struggle, which makes comparisons meaningful.

>> many small decisions

It’s making guesses not decisions, framing as decisions will lead you astray to wasted time and tokens.

It’s vaguely productive to tell them a ton of relevant info upfront attempting to minimise their need for load bearing guesses. I say vaguely because obedience is generally only around the level where it's good enough to lull you into a false sense of security, not to actually be obedient.

It’s a bit more productive to use the various loop mechanisms (hooks, /goal etc) to evaluate each end of turn against guard rails and reject with clear instruction on whats unacceptable. Obviously if you only do this without the front load of info then you’re likely to spend more tokens to reach a satisfactory end of iteration.

loading story #48245967
While you are correct that something like Antigravity 2 + Opus 4.6 can handle large scale software engineering tasks, I would argue that it is usually (but not always) better "coding agent hygiene" to work on smaller code modules and as the human in the loop be a partner, not someone who prompts and then disengages.

Breaking code up into composable chunks has worked well for me over 50+ years as a professional software developer, and I can't get away from the idea that it is still usually the way to go using agentic coding tools.

The one detail I did forget to mention is that if anyone goes with the Mistral subscription (instead of paying per-token), then the Mistral Vibe tool gives you their Medium 3.5 model by default, with a 200k token context. It will probably be enough for plenty of tasks, though there's also a noticeable difference between that and up to 1M.
{"deleted":true,"id":48241718,"parent":48241363,"time":1779484164,"type":"comment"}
> They're gonna say that the future is all AI... until they get the bill.

I mean, the will continue to say so, they just want to be the ones being paid for the service, not anthropic :)

My experience as well... I've only hit Antrhopic's 5hr threshold a few times, and two of them was within a half hour of the window. Also, all three times I'd already accomplished a LOT.

I tend to work with the agent, and observe what's going on as well as review/test and work through results/changes. I spend a lot more time planning tasks/features than the execution, even using the agent as part of planning and pre-documentation. It works really well. I don't think people burning through the 5hr allotment in under an hour are actually reviewing/QC/QA the results of what they're doing in any meaningful way, and likely producing as much garbage as good (slop).

I'm really curious as to HOW the MS employees were using the agents as much as what they were doing.

I suspect subscription limits are quite a bit higher than the equivalent tokens their dollar cost could purchase. I similarly feel like I can get a lot done with a $20/mo Claude Pro subscriptions, but also can easily spend $10-20/day at API pricing with similar usage.
Yep. I get $6k - $8k worth of tokens (at api rates) using the $200 max subscription.
Can verify that I've gotten about $400 worth of tokens from my $20 sub.
Now that sounds like a business I’d like to invest in! When’s that Anthropic IPO anyway?
I don't understand why people are using the API pricing instead of the Pro/Max subscriptions? What am I missing?
Enterprise customers don't get that option. But also if you want a fully custom harness, you also don't get that option.
Personally I prefer the API pricing because I feel like I'm not going to get rug pulled on my work. When it comes to personal stuff, I use the shit out of my sub, but it's not making me money.
loading story #48244479
Because with Max subscriptions, you have to use the Claude Agent SDK, which is basically running Claude Code underneath. You don't get to use the chat/Messages APIs with personal subscriptions, for that you need the API pricing.
Terms of service prohibit subscriptions for employees of companies bigger than X people. I suppose they could all sign up as individuals and try to get away with it but presumably that would look pretty obvious with a tiny bit of analytics.
Anthropic is forcing large enterprises onto api billing instead of subscriptions.
> There's definitely a way to use Claude code that is token conscious.

By buying a subscription and dealing with the limits, using claude code and paying per token seems like the fast lane to the poor house.

I get 98.6% cache hits on Claude code. Short of drastic arch changes it’s hard to imagine it getting much better.
98.6% cache hits doesn't distinguish an efficient workflow from an overly chatty linear agent repeatedly reusing the same context. Plus, it says nothing directly that the process has good useful progress per token.
We are all going to be graded by (tickets closed / tokens burned) soon enough.
Sweet. I can get that up to infinity, assuming they're using IEEE-754 division.
I doubt it, the difference between someone slightly inefficient and someone extremely efficient isn't big enough to matter compared to how much they cost in salary.
You pay for cache hits on every turn and even with the newest architectures longer context is slower/more energy intensive. Constructing concise turns that reuse prefix and stop when the new context is no longer useful help, as does pushing generation down into cheaper models while using stronger models for verification.
yeah, by using codex
---- Before it was:

Me: We need to do this this that.

Claude: <random stuff that approximates human outout>

Me: Are you sure?

Claude: Well actually there is a bug <more random stuff that looks right this time>

----- Now it is:

Me: We need to do this this that.

Claude: <random stuff that approximates human outout>

Claude: Let me consult the advisor on that.

Claude: advisor came up with some advice, adjusting according to that. <more random stuff that looks right this time>