Story Detail of id 48387965 | Liveview Hacker News

ValentineC1 day ago | on: Uber's $1,500/month AI limit is a useful signal for AI tool pricing

> I noted that my own token usage comes to about $1,000/month against each of Anthropic and OpenAI - which currently costs me just $100 per provider thanks to their generous subsidized plans for individual subscribers.

Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Many lower-budget individuals are now moving to China open weight models like DeepSeek. I wonder if China's really subsidising the providers, or if inferencing costs are actually much lower, and Anthropic/OpenAI are just making sure no money's left on the table for their eventual IPOs.

vidarh1 day ago | parent | next

We can tell that the inferencing costs for many of these models are low enough that these models are being sold close to real costs on the basis that many of them are open weight and available from third party providers who have no incentive to subsidize them.

I think the frontier labs will need to drop their high per-token prices at least for their low and mid-level models for the reason that several Chinese models (at least Qwen, DeepSeek, Kimi and GLM) are "close enough" that with the right harness they are cost effective alternatives.

They won't necessarily need to close the gap - at least not yet -, because these models won't necessarily compete at the same token counts. E.g. at least some of them need to do far more work to solve the same problems.

But, yeah, the prices will come down one way or the other.

At the same time, even the subscriptions for the cheap Chinese models are probably subsidised, and those subscriptions are likely to get less generous over time.

White_Wolf11 hours ago | root | parent | next

I really doubt Deepseek is subsidised. It's roughly the same price everywhere you look. Deepseek is using the Huawei hardware (as far as I managed to understand from various articles) and hence the savings.

loading story #48401409

loading story #48397393

loading story #48399726

bel89 hours ago | root | parent

Add MiMo 2.5 to the list. Priced like DeepSeek, performs similarly but it also has vision capability.

dgellow1 day ago | parent | next

One aspect Paul Kedrosky mentioned recently is the concept of „duration mismatch“. The price per token goes down over time (either because the AI vendor reduces due to competition pressure, or because customers are now incentivized to use older cheaper models). But datacenters are financed through debt, with the assumption their revenue increases over time. Quoting him: „[AI vendors are] paying for a fixed cost with a depreciating commodity“[0].

So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt.

0: https://youtu.be/wGZboZcSGDY?is=64GuKyqBh_4aSjTE

missedthecue1 day ago | root | parent | next

"So you have on one end the token revenue trending down, on the other end the training cost going up for the next frontier models, and you need to pay back your 10y debt."

Not necessarily, the bond holders could simply take a massive hair cut and lose shitloads of money. On the topic of bubbles and exuberance, Jeff Bezos made the salient point that there was a massive over-invested biotech boom in the 1990s and tons of sophisticated investors ended up losing lots of money. But humanity still kept the medical advancements made by the boom. Stocks going down didn't un-research drugs, and it won't un-research new GPUs or un-build datacenters.

solatic16 hours ago | root | parent | next

> Stocks going down didn't un-research drugs

Drugs cost pennies to manufacture after they are researched and make their way through the approval pipeline. There are many generic drug manufacturers who can work off the existing formulas.

The more apt comparison is that LLMs won't be un-trained. Opus 4.8 now exists. Even if Anthropic somehow went bankrupt, that particular asset could, at the very least, be sold for proverbial pennies on the dollar to a "generic" inference provider.

saalweachter11 hours ago | root | parent | next

Research does get lost over time. The whole point of the patent system is keeping that from happening; if the drug company goes bankrupt, even if they lose all their internal documentation in the process, hopefully the patents and other public paperwork provides enough information for an unrelated company -- either having acquired the patent rights, or after the patent period ends -- to reconstruct the processes with less investment then the original research.

If a bankrupt AI company maintains enough of a skeleton crew to consolidate and archive its intellectual property it could be sold off to another company, but there are also timelines where it all ends up digital dust in the wind.

loading story #48399038

eecc14 hours ago | root | parent

Or locked away in litigation for decades… See what became of the Amiga

20k15 hours ago | root | parent | next

Datacentres aren't the same as infrastructure or research though. All the hardware in them has a finite, useful lifespan. In 10 years time it'll be totally useless

Hardware fails, and also scales out in terms of efficacy to run it as more power efficient, modern hardware turns up. It requires constant investment to keep it useful, and cost efficient

When AI pops, we'll temporarily have some extra compute capacity that will be horrendously uneconomical to run due to the high grid load and low consumer demand, before they get shutdown. There's simply no real use for them at this scale

dgellow13 hours ago | root | parent | next

Those data centers are specifically for AI workloads. Let’s say everything crashes and we now have all the data centers, what do you do with them? GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.

It’s really not obvious the infrastructure we are building for AI stuff is something that will benefit humanity over time.

Without talking about the fact that bubbles are extremely destructive. Bezos is obviously someone who came out ok from the dotcom bubble but we are talking about something that destroys a lot of value globally. That has real, direct consequences, not just investors losing some money. The US economy is currently only growing because of the AI bet

helloplanets9 hours ago | root | parent | next

AI data centers are being already used at max capacity, aren't they? I have a hard time imagining people would suddenly use AI less than they do as of today, let alone collectively drop it altogether. So the worst case scenario is that they'd need to be auctioned off way under what they'd be worth now, but still for someone to use them for AI.

Inference is much cheaper than training a new model, so running them just for inference is a completely different thing than having to price in the fact that at the moment all of these companies need to compromise between compute for inference and compute for training new models. If no new models were to be trained, and all the compute was inference only, that would change everything when it comes to the overall compute cost of AI.

Dotcom infra buildup is a bad comparison, in that it wasn't even close to being all utilized. The infra was completely overproportional to the day to day usage.

loading story #48400033

loading story #48400814

loading story #48399971

inemesitaffia13 hours ago | root | parent | next

You sell the GPU's to remote gaming companies.

Replace servers with regular compute.

loading story #48400881

loading story #48397076

loading story #48396116

loading story #48396132

PunchyHamster12 hours ago | root | parent

> Those data centers are specifically for AI workloads. Let’s say everything crashes and we now have all the data centers, what do you do with them?

You just run the models and sell the tokens. The demand will still be there even if there will be less money in chasing new frontier model

> GPU are pretty specialized hardware, without AI a data center full of outdated graphics cards isn’t really too valuable.

AI accelerators used in DC are not really "graphic cards" any more, you ain't running gaming on it

loading story #48396336

loading story #48400916

biztos1 day ago | root | parent | next

In order to not un-build the data centers, they at least have to make more than it costs to operate them, and also not have some attractive liquidation value (the land, maybe).

I could imagine something like “inference is done at home or in China, that’s the price to beat” and it’s not worth keeping all those GPUs cool out in Nevada.

missedthecue1 day ago | root | parent

But the parent comment was that one of the bigger costs in these data centers was the interest expense on the borrowed money. A restructuring removes or heavily reduces that amount.

The fiber laid during the dotcom bubble never paid back the investors or lenders, but it's still profitably connecting customers all these years later.

loading story #48390856

loading story #48396060

loading story #48390923

Frieren12 hours ago | root | parent

> Jeff Bezos made the salient point...

Big AI investor tells us that investing in AI is good. Oh, the surprise!

Does that invalidate this point? Yes. Because it makes no sense. The big money is not going to R&D but to build infrastructure that will be outdated in 5 years.

loading story #48403200

geysersam1 day ago | root | parent | next

Current AI datacenter/model development investment rate is roughly 1T/year. That's a lot. But the US economy is 33T/year. So the investment pays back (roughly) over ten years if, each year, the AI investments increase overall productivity by 0.6%, assuming the AI companies can capture half of the value of that productivity gain.

> „[AI vendors are] paying for a fixed cost with a depreciating commodity“

That's just a confusing way to say you don't think future models will be worth the development costs. Because if future models are significantly better, why would the price of tokens to access those models deprecate?

timacles19 hours ago | root | parent | next

I'm surprised people think LLMs, a thing which mainly excels at advertising, spam and writing code is going to generate that much economic activity.

ashdksnndck19 hours ago | root | parent | next

Companies whose main core competency is writing code were already making up a big chunk of the economy before AI. Also, less wealthy companies were constrained in their use of software by the inability to afford the salaries of talented programmers (and ripoff practices from software consulting companies who in theory could help). Lowering the cost of building software systems ought to unblock a good amount of economic activity as the technology diffuses.

bunderbunder17 hours ago | root | parent | next

Those companies are certainly writing more code. But It isn’t clear that they are increasing their economic productivity. It could even conceivably have the opposite effect by fueling a race to the bottom.

e.g. an interesting possible canary in this coal mine is that there’s been a 200% increase in the rate of new apps appearing on Apple’s App Store, but it has not been accompanied by a 200% increase in the rate at which people are buying apps.

andwur15 hours ago | root | parent | next

The AI pundits often seem to apply the logic that code output is directly proportional to revenue and/or profit, and as such it follows that an AI usage increase leads to more code which leads to more revenue.

I don't believe this aligns with the reality of any major company, unless your business is in the literal sense "selling code" your revenue and profit is tangential to the quantity of code you produce. Google is a good example of this: most of their revenue and profit comes from their ad network, which is disconnected from their development productivity and instead heavily reliant on network effects and time in market. If I was a new competitor with infinite AI funds to throw at whatever problem I choose, I can't simply capture their market by developing an exact copy of Google's ad platform. In the same way, Google can't substantially grow their ad network by coding "more" or "better", they still need more customers and consumers to interact with their network to see any increase in revenue.

So it doesn't directly follow that a productivity increase will inherently follow an AI usage increase.

timacles7 hours ago | root | parent | next

I would go as far as to say writing more Code has almost no impact on their economic productivity. What drives those companies is infrastructure and networks

loading story #48399812

therealdrag016 hours ago | root | parent

That’s great for consumers.

loading story #48394856

loading story #48394522

loading story #48394935

timacles7 hours ago | root | parent | next

You are wrong, sir. Their core competency is building out infrastructure and networks to support their software and user base. software is by far the least complicated thing they do.

what makes YouTube YouTube is not the video player it’s the servers that can handle petabytes of uploads a day and billions of views. YouTube software wise, is no different from the 100s of porn websites that are coded by small European teams

samat10 hours ago | root | parent | next

I am yet to see that ‘companies with great ideas which simply cannot afford those very expensive developers’. For the most, issue is not programmer costs. Mostly it’s inability to formulate the MVP which makes sense.

‘uber for my industry’ is not a sensible business strategy

Honestly, if you know guys whose bottleneck is pure software dev — please let me know, I have a good, experienced team in Eastern Europe, we can do wonders in product development. But coming up with sensible business ideas and executing on them in the real world is crazy hard and extremely rare.

IsTom13 hours ago | root | parent

If we talking about Meta, Google, etc. code is only incidental to them earning money.

lesostep13 hours ago | root | parent | next

But what if it kills current ad-tech as we know it (paying to show ads on random sites without any way to verify that the site is legit), and the flow of ad money for legitimate goods turns back to journalism, magazines and other publications?

That would be half a trillion[1] redirected to regular people just from Google Ads.

[1] snatched my number from here: https://pixis.ai/blog/2025-google-advertising-benchmarks-for...

ZeroGravitas12 hours ago | root | parent

The other day I watched a YouTube video on a work machine with no history and got 2 AI generated video ads for scam products before the video played.

An AI generated man talking about his product building journey to make a pressure washer hose that didn't need power (in the AI video it didn't even have a water supply connected!) that was going to be banned in a week because it was too powerful so buy now.

I've seen AI slop before and scam ads before but the combination of the two gave me some real tingly spider-sense that things are going to get worse and that some unethical people will make a lot of money from it so be in no hurry to stop it.

phillipcarter8 hours ago | root | parent

Two of the things you’ve listed are some of the most profitable activities in our economy.

timacles7 hours ago | root | parent

I mean, that says a lot about the kind of crisis out current economy is in. How much longer can the United States Be a world leader when it’s primary function is social media and advertising

jiggawatts23 hours ago | root | parent | next

The $1T number seems more promises than reality, which is closer to the $300B to $500B level. Still a big number, but between a third and a half of the value used in the popular media.

PunchyHamster12 hours ago | root | parent | next

The cost of power cost increase alone on industry gonna erase all gains from it.

You can't consider it in vacuum. AI takes limited resources. So far it winded up cost on near every consumer electronics that runs an OS, and it winded up cost of energy that is used by the entire industry and every single customer

It's not just the cost of datacenters, it's cost of infrastructure (that given current direction of US govt will just be paid from people's fucking taxes and bills..) and cost of other industries turning outright unprofitable "thanks" to demands of AI

flextheruler22 hours ago | root | parent | next

These are similar numbers to the dotcom bubble. With GDP growth and the percentage of productivity AI contributes staying the same in this scenario this requires regular gains in revenue or growth. If things just stumble, like with most datacenters going unbuilt the bubble will pop.

dgellow13 hours ago | root | parent

A few things, I think you’re missing the point here

- most tasks do not require the latest frontier models, even if they are a magnitude more intelligent (we don’t actually know if that will be the case). Current Gemini flash is cheap, fast, and pretty capable with good guidance for most tasks

- now that companies pay API costs instead of a subscription they will be setting restrictions on token use to not have their budget explode (like Uber in this submission), that’s a strong incentive to NOT use expensive models, and limit their thinking budget

- there is competitive pressure from China and others who can offer very decent performances at a fraction of the token price

- the price of tokens for the frontier models is likely to go up, but the price to access older models is what depreciates! The overall price per token is going down now that we are in a new world where companies understand that token maxing is one of the stupidest concept ever created by humankind.

treis4 hours ago | root | parent | next

Relative to the current usage demand for tokens is effectively unlimited. If the price of tokens go down people will send more tokens to compensate. We are very very far away from a cost per token where people run out of things they want to send through an LLM.

try-working18 hours ago | root | parent | next

If you have a good model router, you can route to older, cheaper models that run on older hardware, for simpler tasks. That helps labs extend the economic life of their hardware investments. They will likely fight it at first though as they see it as reducing ASP.

This is why I'm building role-model, a routing protocol and a router runtime: https://role-model.dev/

jurgenburgen11 hours ago | root | parent

Running cheaper models on newer hardware is always going to beat running them on older hardware.

bandrami15 hours ago | root | parent | next

The other part of that is that while price per token may be going down, tokens per task is going up

no-name-here14 hours ago | root | parent

For ~equivalent tasks/results, or because we’re expecting more or better from tasks?

The real measure should be cost per ~equivalent task result, not cost per token nor tokens per task.

bandrami13 hours ago | root | parent

For better performance of ~equivalent tasks. That's what all the harness tooling people are using does: (often) increasing output quality by significantly increasing token counts.

alfalfasprout1 hour ago | root | parent | next

Right. Which means tokens are actually being priced well under cost once you factor in all this datacenter/GPU capex. Also worth noting the datacenters are not purely for training. They're for inference too.

Forgeties7912 hours ago | root | parent | next

I really wouldn’t be surprised if we saw some of these data centers scrapped in the next few years

bijowo16761 day ago | root | parent | next

do GPU chips really depreciate physically? There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally.

I think its only accounting depreciation.

I have been using my laptop for a decade, what is stopping datacenters from using the purchased GPU chips for a decade?

bgnn1 day ago | root | parent | next

Chips age and fail with age. You can check hot-carrier injection, bias-temperature instability and electromigration as they are the main aging mechanisms. All if these are a linear function of time but exponentieal of temperature. 90-100C these chips are running at are really tough, so they are likely to fail at couple of percent to 10% range in 2-3 years depending on the margins they have in the design.

The solder joints are notorious to fail at a high rate too.

consp1 day ago | root | parent

If those don't go the caps and coils will eventually.

chadgpt322 hours ago | root | parent | next

those are easy and cheap to replace

jetbalsa17 hours ago | root | parent | next

Depends, the SMD caps spread across the board the tiny ones do start to fail and go out of spec over time. they are a right pain to replace and hard to spot one that has gone out of spec to cause the chip to start crashing.

loading story #48394096

lelanthran15 hours ago | root | parent

Not if you account for labour.

lazide23 hours ago | root | parent

Caps also have a rapid aging with temp.

Aurornis1 day ago | root | parent | next

There are data centers that use and rent out 10 year old server GPUs.

They can't run larger modern models. They can't run smaller models as fast as newer servers. So their remaining market is applications where customers are okay with older, smaller models and slower performance.

They have to price the service lower than competitors due to the lower performance. The older GPUs are less efficient so it costs them more to keep them running. They're paid off, but they're taking up valuable power, space, and cooling in a data center.

Eventually there is a tipping point where it's better to replace that space and power budget with something new that has more demand.

The parts are sold off on the open market. There's an equilibrium demand for the parts from other data centers keeping older servers running and from hobby people who are okay with a jet engine sounding toaster of a GPU running in their home.

grogenaut16 hours ago | root | parent | next

except for you know the enterprise customers who won't change their code and will pay to run old inefficent hardware just to keep from dealing with upgrades?

loading story #48394297

jmalicki1 day ago | root | parent

As long as the demand for GPUs keeps increasing, there are more data centers being built to house them.

When you have waitlists for many many months for Blackwell GPUs, keeping the old ones around as long as customers are willing to pay for them is great.

If I as a customer have a use case for a machine learning model I developed awhile ago, so an insect identification model, I had an ML researcher/eng develop it back in 2019, and it runs fine on a 2018-era T4 GPU (NVidia 2080 era), why mess with it?

loading story #48390014

munk-a1 day ago | root | parent | next

In addition to the physical depreciations other comments mentioned I'd also mention that old chips will settle into a low price and then actually go up on a per unit basis if you're trying to buy a significant amount of them. With a limitation on fabrication facilities continuing to pump out older cards is an opportunity cost to the manufacturers that would prefer to be producing newer cards. If you were in a place where you suddenly wanted to buy 10,000 3080s, as an example, I'm not certain if the market could actually fulfill that demand and no one with the ability to increase the available supply to meet that demand actually wants to do so.

Chips do wear out and need to be replaced (entropy do be like that and durability is not a primary concern for chip design) so you'll need to refresh your stock and, even if you don't need cutting edge models, the price of all chips at scale will go up over time. It may feel unintuitive since, when the PS3 was released PS1s were extremely cheap - but if you're struggling to understand this effect from your experiences in the consumer market you're actually looking at the price factor that starts making antiques increase in value since at a certain point they become scarce goods. The market price for an NES is higher today than it was in 2003 because the price had already bottomed out from demand from the general consumer market but the demand remaining (speedrunners and the like) is now fixed or growing while the supply is inevitably shrinking.

malfist1 day ago | root | parent | next

They do degrade physically, but the bigger thing is they stop being competitive quickly. Each year or so we see doubling of GPU speeds for the same amount of power.

If you build a 100MW data center with GPU compute and three years laster a new data center opens with the same cost for GPUs and same electricity cost you do, but can do twice as much compute, you quickly lose business unless the market is just so constrained customers can't afford to be picky. But the moment there's slack in the market you'll see major migrations off of providers that have the same cost but half, or quarter of the same performance.

So when you see someone talking about GPUs fully deprecating in value in 1-3 years this is what they're talking about. Right now it's not a big deal because there's no slack in the market. But once there is, the bottom will drop out.

tardedmeme1 day ago | root | parent | next

Gradually, and especially when hot. Modern chips are pretty close to the physical limits of how small they can be made, and that means atomic/chemical effects like electromigration are accounted for and determine the lifetime. Every extra 10 degrees Celsius of temperature doubles the speed of chemical reactions.

When they stray too close to the line ... you get Intel's 13/14th gen chips that wear out after 1-2 years instead of 10-20 years. Intel calls it "Vmin drift" because that doesn't sound scary, but the actual point is that various wear-out mechanisms push the chip outside of its design envelope - increasing the voltage or lowering the clock speed may get it to run for a while longer, but you're living on borrowed time as the various circuits just stop working right and you get unpredictable instruction mis-execution: https://fgiesen.wordpress.com/2025/05/21/oodle-2-9-14-and-in...

bijowo16761 day ago | root | parent

sounds like planned depreciation on Intel's part, they definitely do not design server grade chips for longevity since that would harm their own revenues

HDBaseT23 hours ago | root | parent

It was not planned depreciation, as many chips were failing even before 2 years and this impacted not only PC Builders and Gamers, but also some server infra providers too.

This was simply poor design, it took Intel ages to really figure out what went wrong and "resolve" it.

It cost them far more than it made.

chadgpt322 hours ago | root | parent

They didn't replace all the chips like with the FDIV bug though. What did it cost them? Only reputation?

tacticus21 hours ago | root | parent

Not even that in the end.

vb-84481 day ago | root | parent | next

I used to work in datacenters, during spinning disk era we had technicians from vendors basically every couple of days to replace some broken part. When the massive switch to ssd happened instead of having them every couple of days it was 3 or 4 times per month.

Despite no moving parts things broke anyway and, even if it doesn't break, the vendor can make you change the technology just by playing with maintenance cost of the older one, limiting or removing spare parts from the market.

_fizz_buzz_8 hours ago | root | parent

My understanding is that a lot of AI data centers are still heavily relying on spinning HDDs, which is why seagate, western digital are selling more HDDs than ever before.

loading story #48398461

whateverboat1 day ago | root | parent | next

Today's data center GPUs are essentially overclocked, and so at limit of how much the chip materials can physically handle, and therefore degrade over time. For example, GH200s operate at 1W/superchip but the actual safe power is somewhere around 650W which will allow them to function for a decade or more. But that leads to around 15% slowdown and that is unacceptable in today's competition. So current GPUs are destined to be depreciating assets.

In future, we might have fixed cost GPUs but not today.

missedthecue1 day ago | root | parent | next

I would presume the reason they are overclocked is because they are trying to make up for the shortage. In time, the shortage of computing components will be remedied, and tokens produced at lower power pulls will be cheaper.

bijowo16761 day ago | root | parent

i think its reasonable to give up 15% of speed for a decade more lifetime. This depreciation change alters economics of GPU

nothercastle23 hours ago | root | parent

That extra decade might provide almost no revenue. The long tail isn’t profitable

threetonesun1 day ago | root | parent | next

I assumed the issue was similar to crypto mining, where given finite amounts of space and power it makes sense to always be running the latest and most powerful GPUs instead of keeping older hardware running. There's definitely a secondary market for these GPUs as well.

mattalex1 day ago | root | parent | next

Nothing is stopping them, it's just not worth it: Have a look at e.g. vast.ai's pricing (https://vast.ai/pricing).

The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc... Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.

And remember: V100s hours are sometimes sold at 1/10th the price.

If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.

It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.

numpad01 day ago | root | parent | next

Chips do deteriorate and fail naturally at datacenter scale or in timescales of decades, though not exactly like on financial reports. Leak current increases or electro-migrations occur at junctions or whatever those words mean.

And yeah, it does feel like GPUs will start losing values slower going forward with Moore's Law being dead for a while. It used to be that 3-5 years old GPUs were more useful as space heaters than GPUs, but that's much less of the case today.

foobarian1 day ago | root | parent | next

> There are no moving parts, I dont think memory chips or GPU chips deteriorate naturally

I believe they do, but I too would love to know more details because there are several ways this can happen. Electromigration, package failures, VRAM failures, dielectric breakdown... Hopefully there will be studies soon similar to that old Google paper on HDD failures!

hgoel23 hours ago | root | parent

Currently it's a pretty big ask to look at the several hundred billion transistors and the interconnects between them to find what broke.

Though, those capabilities are maybe just a few years out, funnily it's taking AI to make it potentially doable.

dgellow1 day ago | root | parent | next

GPU do depreciate indeed, but here the depreciating commodity is the token, not the hardware. You sell cheaper token with the same hardware

xyzsparetimexyz9 hours ago | root | parent

When everything is said and done it'll be datacenters in American competing with ones in China that have several times lower electricity prices. Token prices will drop to a level that will be unprofitable for American data centers and they will need to close.

Thats the main issue here.

manyatoms1 day ago | root | parent | next

the hardware itself is still useful, but random failures happen every so often, so if you're trying to run a fixed sized fleet then your fleet shrinks when you can't get spares any more

bigfishrunning1 day ago | root | parent | next

Your laptop doesn't have a 100% duty cycle. If you ran it like a data center it would indeed wear out much faster.

ozim23 hours ago | root | parent | next

Transistors do wear out. Not going to elaborate as it is easy to ask GPT

fooker19 hours ago | root | parent | next

When it was profitable to mine crypto with GPUs people used to sell these miner GPUs on the used market after about two years.

These were about half of the cost of an used GPU just used for gaming. By that pricr, I'd say a GPU kept busy has twice as high a chance of failure after two years of use.

Not great, not terrible.

sandworm1011 day ago | root | parent

Yes, even if the hardware is untouched. As technology advances, the power cost per compute cycle goes down. A gpu using old tech costs progressively more to operate compared to the newer models. So its value goes down over time = depreciation.

As for duty cycles, the chips are perfectly happy at 100% operation. Cooling and power componants fail, not the chips. But it costs manpower to repair such things and manpower is inconveniant these days. A gpu with any sort of fault just gets dumped.

bethekidyouwant22 hours ago | root | parent

Using a shittier model is just more work for the user, I’m not sure why anyone does it, unless they’re playing with it like a toy.

SoMomentary21 hours ago | root | parent | next

Local privacy respecting inference can be worth it. I use a local model to log everything I do all week to automate my timesheet. I also have it do a bunch of other data tasks. I won't say that larger SOTA models wouldn't do these tasks better than a local model but PII is a concern and my employer wouldn't approve of me just setting tokens on fire everyday to do what I could do myself.

jurgenburgen11 hours ago | root | parent

> I use a local model to log everything I do all week to automate my timesheet.

Isn’t that just more work than logging it yourself?

SoMomentary5 hours ago | root | parent

Not at all! My company has 100s of clients and we track time in 6 minute increments. I feed in my browser history, terminal logs, session scripts, calendar, git commits, etc etc into it and voila it produces a highly accurate timesheet in no time flat.

Automating it has been way better for me than the alternative of breaking my flow whenever I'm switching tasks to chart my time, or logging all my hours for the week in one sitting. Different strokes for different folks I suppose.

Kaliboy21 hours ago | root | parent | next

I sometimes let Claude Opus create plans, DeepSeek v4 pro implements and writes tests. Claude reviews and corrects.

Saves like $2-3 per session. Same quality code.

loading story #48403439

no-name-here14 hours ago | root | parent | next

> more work for the user

Model routers allow this to happen automatically without any more work by the user.

> a shittier model

A ton of tasks don't require the most expensive frontier models, etc.

> I’m not sure why anyone does it

1. Faster solutions from the LLM - also reduces employee costs of having the employee waiting on the LLM

2. Avoiding things like the half-billion dollar per month bill for a single company’s LLM use recently reported in Axios

dgellow13 hours ago | root | parent

What you call a shittier model is what was considered frontier and fantastic one generation ago…

satvikpendem1 day ago | parent | next

Don't worry, they'll just lobby to ban Chinese models instead to keep their token revenues high.

> Compounding the problem, labs in China often release dual-use capable models as open-weight. Once a model is open-weight, safeguards that do exist can be removed, making the model available to any state or non-state actor to use for malicious purposes, including the cyber and CBRN misuse those safeguards were built to prevent.

https://www.anthropic.com/research/2028-ai-leadership

CuriouslyC1 day ago | root | parent | next

If you do the math, they don't have a choice. If China captures America's AI market it'll cause a major depression. They'll give it the BYD treatment, though it'll be a lot less effective.

loading story #48390594

loading story #48390491

le-mark12 hours ago | root | parent | next

> Once a model is open-weight, safeguards that do exist can be removed

Safeguards trained into the model (ie exist in the weights) can’t be removed.

loading story #48397298

loading story #48398138

throwyu820 hours ago | root | parent

China is the worst trading partner in the world. They banned most companies from functioning in their country for decades

loading story #48395492

loading story #48392382

Animats1 day ago | parent | next

Raise them, more likely. NVidia says that GPU hardware prices won't decrease until at least 2030. The world is out of fab capacity.

davedx15 hours ago | root | parent | next

Meanwhile, Google...

loading story #48395354

kristianp22 hours ago | root | parent | next

> The world is out of fab capacity.

Can anyone expand on this point? I read an article saying that the big AI co's datacentre spend was a bunch of lies because they can't build datacentres at anywhere near the rate they want to.

loading story #48395382

loading story #48395310

EA-31671 day ago | root | parent

Seriously, they’re trying to justify trillion+ IPO’s while setting piles of money on fire, prices aren’t going DOWN.

loading story #48389928

loading story #48391001

loading story #48402892

freediddy1 day ago | parent | next

Most sane US companies will disallow use of cloud-based Chinese AI providers, because everything including code, data, PII, etc is being sent to them.

eikenberry1 day ago | root | parent | next

Then don't use the cloud-based Chinese providers, use cloud-base US/EU providers using Chinese models. The interesting Chinese models are all open making this issue mostly moot.

loading story #48393918

ceejayoz1 day ago | root | parent | next

Saner companies ask the same question about models from their own country too.

rd1 day ago | root | parent | next

I wonder if I could start a US-based company with good data regulation and just serve open-weight models at a competitive price. I feel like the real barrier is just that most companies willing to adopt AI usage enough to make it worth it at this point don't want to be using inferior models.

loading story #48388828

loading story #48388820

loading story #48388868

loading story #48391042

loading story #48389727

amunozo1 day ago | root | parent | next

You can run DeepSeek as it's open weights, unlike Claude or GPT.

HWR_1412 hours ago | root | parent | next

Do you trust OpenAI with your code, data, PII? What makes you so sure it's not all part of the next training set anyway?

tmp104232884421 day ago | root | parent | next

There are some objections here saying that some US firms are using Chinese AI providers, but I wonder if any of those are subject to compliance. Large firms that are disproportionately responsible for AI spending are all subject to compliance.

cheeze1 day ago | root | parent | next

Deepseek has some models in Bedrock. There is definitely a huge market for a "good enough" model running within the country of the company

loading story #48391182

23 hours ago | root | parent | next

{"deleted":true,"id":48391036,"parent":48388437,"time":1780525916,"type":"comment"}

smoe14 hours ago | root | parent

[dead]

LastTrain21 hours ago | parent | next

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Raise, they are going to raise the prices. We will spend more on AI infrastructure in 2026 and 2027 than the gross sales of the entire global software and services sector. Current pricing is at a major loss for current providers.

xyzsparetimexyz9 hours ago | parent | next

Why would I even pay for deepseek? I get deepseek v4 flash for free with opencode. If I somehow run out of tokens for the day, I can just then on my vpn

testdelacc11 day ago | parent | next

Per token costs will fall, but the harnesses will get more token hungry. Instead of just centering the div it’ll spin up a battery of agents to architect, critique, advise, code, review, refactor and so on.

sevenzero1 day ago | root | parent | next

I wish I could disable most of these. I already hate all the "oh you're actually right, let me fix that" nonsense. Then it proceeds to burn 50k tokens on the git history instead of copying logic A from a different part of the codebase to logic B, where I want that exact logic without having to write the boilerplate myself...

loading story #48388262

loading story #48388356

KaiShips1 day ago | root | parent

[flagged]

ed_elliott_asc15 hours ago | parent | next

If Anthropic are then they are making a big mistake, their token hungry Claude code is far too greedy

bigbuppo20 hours ago | parent | next

They're going to need to bring in a few trillion dollars fast to meet wall street expectations. Expect prices to rise.

PunchyHamster12 hours ago | parent | next

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

Are they even making money off them now ?

SecretDreams1 day ago | parent | next

> Do we know that AI providers are going to keep these per-token prices, or eventually lower them because of competition from China?

I genuinely do not know how prices can get lower from the current major providers in NA without the whole market collapsing. Everyone is spending copious amounts of money to presumably make more money back.

aDyslecticCrow1 day ago | root | parent | next

An inference only platform selling good open weight model inference without the research overhead could capture a-lot of market for lower size model uses (haiky, gemeni flash). Diffusion-transformers and clever cashing can drop inference even lower, which is improving at a high rate.

The biggest reason large models are un-attainable for local applications is the lack hardware with large amount of unified/graphics memory (and the cost of the platforms that do). Once the memory slog goes back to normal and hardware manufacturers adapt to demand, we may see consumer hardware with large memory capacity effectively opening the door for slow but usable frontier model inference (assuming improvements in model efficiency and compute capacity)

At that point, inference becomes a race to the bottom. The large labs hope they can attain a leap in capability (which is increasingly looking bleak, with a average catch-up of just a few months) or market dominance through integration (integration in platforms and OS, exclusive deals with companies or governments).

For coding agents, i suspect no player will manage lock in enough market to enforce pricing much higher than the true inference cost, and catering to programmers becomes an unsustainable proposition. We will instead be further hit with a lot of AI integrated into our other tooling costs, such as GitHub, Microsoft suite, G-suite, forcing in AI functions as a value-ad into the total cost without giving the option to exclude them. (using their market position)

loading story #48389500

loading story #48388821

HDThoreaun1 day ago | root | parent

Prices can go down while tokens sold increases so that profit increases. The labs number one goal right now is moving past software engineers so that every white collar worker in the country finds ai assistants indispensable. Speculation here but I think openAI/antrhopic api inference is insanely profitable, it just needs more volume to amortize the training costs.

loading story #48390321

hanzeweiasa8 hours ago | parent | next

[flagged]

mcmcmc1 day ago | parent | next

[dead]

cyanydeez1 day ago | parent | next

id be amazed any american business will aend data to china

linkregister1 day ago | root | parent | next

HuggingFace offers DeepSeek as one of its models— it's pretty simple to spin up instances under your control.

I'm not sure about OpenRouter but I wouldn't be surprised if they offer a US-based provider of DeepSeek.

For reference, Cursor has their first own light fork of Kimi that they use as their baseline coding and review model.

loading story #48388803

alpinisme1 day ago | root | parent | next

“Any” is a very high bar Unless laws prevent it, I don’t see why a substantial minority wouldn’t buy services from where they can get them at a similar quality and much lower price.

dkersten1 day ago | root | parent | next

Together.ai provide many open weights models and as far as I’m are their servers are US based (the company certainly is)

lowbloodsugar1 day ago | root | parent

Any IT cost center will send to the lowest bidder. This isn’t intellectual property: it’s annoying shit that is an unwelcome cost of doing business. China might copy our tedious scripts? Will they make a product out of it? Can I buy it and fire my IT staff? Great!

Not everyone using AI is using it to code core value IP.

vinzenzu14 hours ago | parent

API prices of Anthropic, OpenAI, and Google are massively inflated.

https://martinalderson.com/posts/no-it-doesnt-cost-anthropic...

There's no way that all AI inference providers are colluding and/or all running at a massive loss, meaning the cheap Chinese model prices must be the real cost it takes to run frontier-class models PLUS their margin.

Look at Deepseek 4 Pro. https://openrouter.ai/deepseek/deepseek-v4-pro/providers Deepseek and Baidu are subsidising prices but they probably train on inputs. I have no model training and ZDR in OpenRouter enabled, and the first provider that shows up there is Deepinfra, significantly more expensive than Deepseek. BUT much cheaper than Sonnet 4.6 and ChatGPT GPT-5.4.

#visit	13,566,584
#session	74,665
#live-session	0