Story Detail of id 48312318 | Liveview Hacker News

pants223 hours ago | on: Claude Opus 4.8

The Chinese models are only cheap on subsidized Chinese hosting. I have yet to find a USA-hosted Chinese model with a very clear value advantage over US models.

wg022 hours ago | parent | next

No true. Also - put Deepseekv4 Flash on your local with effort set to "high" and you'll see that many many are using that model on their own machines without paying anyone anything.

Its just that some of us didn't imagine having GPUs would be advantageous and were not gamers on the side. Those who had beefy GPUs or GPU rigs for any reason, they rarely need to go anywhere else.

At least I am so impressed with Deepseekv4 AFTER using Claude Opus 4.7 for significant amount of time that I am not going anywhere but Deepseekv4.

The model is just INSANE. Things I have done with it include attempting to write a 2.5D game engine in C with full animation and map rendering layer by layer.

pants222 hours ago | root | parent

You'll need to spend at least $20K on a workstation that can run DS4 Flash. It would take ages to reach that much in token spend at the speeds it runs at, and if you factor electricity costs you will likely never break even vs using API.

weitendorf21 hours ago | parent | next

There are basically two tiers of "Chinese models" in this context, the "edge" sized ones with ~30B parameters or less, and the big ~1T models that can basically only run in the datacenter.

I don't think it's as simple as saying China's hosting is subsidized, they have generally cheaper electricity and labor costs than in the US and don't have access to the top tier models, and a large internal market where the big models are the best thing they can run with what they have. So obviously they max out on their top models (which are trained with their hardware market in mind, not ours) and get the economy of scale from that, and can run generally the same hardware for less money than in the US because

The edge models are very cheap to run and can do so on inexpensive hardware. They are like 95% cheaper to run than Haiku, so the math is in their favor for certain batch workloads. Most people just run the models for themselves when they do that without making it available on openrouter or whatever, because you can just provision a gpu node and use it as needed, and it's not that expensive to run this family of models.

Is your problem that you want to call Chinese models hosted in the US because you're worried about the data handling?

pants220 hours ago | root | parent

I obviously don't know the full economics of the Chinese-hosted models, but estimates[1] put the cost of hardware (servers + networking) at 70-80% of the total cost. Those things aren't meaningfully cheaper in China, so serving DeepSeek at 1/3 the cost of the cheapest US provider doesn't really compute unless it's heavily subsidized or we believe that Chinese engineers are just that much better at optimization.

Edge models, yes, they can be convenient to run batch jobs locally. I still would argue there's no economic benefit over paying for models. Haiku has a bad price/perf but others in that class are significantly cheaper in hosted APIs.

Doesn't matter what I think, the reality is that the majority of enterprises (where the real $ comes from) will not consider sending their data to China.

1. https://epoch.ai/data-insights/ai-datacenter-cost-breakdown

torginus18 hours ago | root | parent

Hardware is arbitrarily priced, with the floor being as little money as it costs to make it, and the ceiling being how much competitors are willing to pay for it - the latter is much more of the driver of current pricing in the West than in China.

In a free market, the country would not matter, but Chinese models are often running on domestic hardware which does not directly compete with Nvidia GPUs and thus they can't get away charging as much for it.

fittingopposite11 hours ago | root | parent

Numbers?

ekidd22 hours ago | parent | next

The Chinese models are surprisingly cheap and performant sitting under my desk. Qwen3.6 27B is nowhere near as autonomous as Opus 4.7, but it runs in 24GB of VRAM. And it's actually great for the use cases where I'm going to carefully read and understand all the code anyway.

If you want to support a team of engineers, DeepSeek V4 Flash is antirez's current favorite. And you could support a team of engineers pretty nicely for $40-50k. Which might not make sense if you're on a Claude MAX 5x plan or the old enterprise group plan with fixed price seats. But Anthropic is switching their enterprise contracts over to token-based pricing, at which point $50k is looking pretty good.

joshhart12 hours ago | parent | next

Fireworks will serve them for $1.74 / $0.14 / $3.48. That's input / cached input / output. https://fireworks.ai/models/deepseek-ai/deepseek-v4-pro . Call it about a third the price of Sonnet.

Not nearly as cheap as the Chinese infra but still pretty cheap.

pants244 minutes ago | root | parent

Sure, but Sonnet is a pretty bad deal these days - that's a similar price to Gemini 3.5 Flash and more expensive than Grok 4.3, both of which are better and faster. Those both use less than half the tokens on the Artificial Analysis Intelligence Index which means they're probably more cost efficient for many workloads.

harsh319522 hours ago | parent | next

You can find them on Deepinfra. Palo Alto company. Similar cheap price.

pants222 hours ago | root | parent

Not similar. DeepInfra[1] has DS4 Pro pricing at $1.30/$2.60 which is 3X the Deepseek[2] (Chinese) hosting at $0.435/$0.87. DeepInfra is also very slow at 37 t/s and uses an FP4 quant[3], so intelligence will be degraded slightly.

Meanwhile you could use Grok 4.3 for the same price which is smarter and 5X faster[4].

1. https://deepinfra.com/pricing

2. https://api-docs.deepseek.com/quick_start/pricing

3. https://artificialanalysis.ai/models/deepseek-v4-pro/provide...

4. https://artificialanalysis.ai/models/grok-4-3

wirybeige19 hours ago | root | parent

DS4 Pro/Flash were post trained with QAT, so they are already quantized to FP4 for the most part. That's why when downloading the weights, they are much smaller than what their weights at fp8 or fp16 would be. For example, Flash is a 284B model, but its GB size is only ~160GB. OFC maybe DeeppInfra went even further, but there is no proof of that.

pants212 hours ago | root | parent

Interesting then that OpenRouter[1] tags many providers as FP8 and DeepInfra as FP4.

1. https://openrouter.ai/deepseek/deepseek-v4-pro

__mharrison__23 hours ago | parent | next

Odd take. I'm running them locally at my desk (DGX Spark and 128GB MBP). They work fine for 90% of what most folks do. Admittedly, they do run slower on my hw than on the cloud.

pants222 hours ago | root | parent

Running them locally is cool and has privacy/autonomy benefits, but you can't really make a value case for it. Guaranteed if you run the math you will never run enough inference to pay off your hardware vs buying tokens. Last time I ran the math on my MBP I'd have to run inference 24 hours a day for 5+ years to pay off the cost of my MBP, not accounting for electricity costs.

iooi22 hours ago | root | parent | next

Is this because of the tok/s? Since it's pretty easy to run up a $5k bill in API usage for Claude/ChatGPT in a month.

loading story #48312815

slopinthebag20 hours ago | root | parent | next

The value of not having a reliance on a third party company, and not needing an internet connection, and having total privacy: ∞

fragmede20 hours ago | root | parent

Just have to put some numbers on privacy and autonomy. What's the fine to my company if I get hacked and leak all my customer's PII? What's the cost in productivity lost if OpenAI/Anthropic/Google decides to suspend my account for an unknown reason?

slopinthebag20 hours ago | parent

Huh? They're several times cheaper than SOTA models at market rate prices.

pants220 hours ago | root | parent

If you are only looking at US hosting providers, models from US labs easily meet or beat models from Chinese labs on the same intelligence level. I'm not comparing DeepSeek with Opus because those are on different levels of performance.

slopinthebag20 hours ago | root | parent

Deepseek v4 Pro on US hosting is like 1.5x cheaper and 5x cheaper on input/output compared to Sonnet, and that's not even a fair comparison because Deepseek is much stronger than Sonnet. It's more reasonable to compare with Opus 4.5, which is much more expensive.

pants217 hours ago | root | parent

Sure but you can also look at Grok 4.3, which is smarter and faster than DeepSeek at the same price point.

slopinthebag37 minutes ago | root | parent

I doubt that is the case

#visit	13,438,403
#session	74,665
#live-session	0