"Frontier models" are caught in a financial dilemma of their own making --- they have spent such huge sums on development and as a result, they may have inadvertently priced themselves out of the market.
Energy costs are a huge factor for AI. He who has the lowest energy costs will likely be able to dictate market prices. And fossil fuels dependence doesn't look to be advantageous for AI.
The frontier models are going to win that way. They won't feed your code back into the system but they will track which code you keep and what code gets a "try again claude".
They're not going to lose on price. No consumer software ever has because ultimately it's not that expensive relative to salary and the marginal cost is 0.
This is true for traditional SaaS too, but the number of concurrent users that could be served by one machine and the cost of the hardware were both at least an order of magnitude better.
In other words, AI is not your daddy's software. Comparing AI with old school software markets simply does not compute.
Last week we were all talking about how Anthropic has too much demand, how they had to rent a data center from a competitor, and how the limits they’ve put on their service to deal with the demand are making users angry.
DeepSeek is cheap because they’re working hard to attract users.
The open weights models released for free weren’t free to train. It’s a loss leader to get attention to try to sell you something in the future.
The prices we pay for tokens right now are set by supply and demand, with some being sold at high premiums and others at a loss. Some models are given away for free after the companies spent money on researchers and compute.
https://openrouter.ai/deepseek/deepseek-v4-pro/providers
Deepseek v4 Pro is much cheaper when provided by Deepseek itself, likely as a combination of the loss leader strategy you mention and the desire to have more data flow through their pipeline for training. However, the same open weights model, provided by other providers, is somewhere in the $2-3/1M output-tokens range. Compare Opus 4.7 at $25/1M output-tokens.
Unless you mean that releasing open weights models is the loss leader, in which case, you might be right but I hope you're wrong. We've seen some of this from Qwen at least - their latest model is closed only. I hope there's always someone willing to make this bet and release better and better open models.
This is specifically what I meant.
DeepSeek’s official service is trying to recoup some of the training and engineering costs too.
The other providers only have to recoup their hardware costs and the cost of a team to run it.
Even though DeepSeek’s official service is more expensive per token, they’re running at a lower profit than the OpenRouter providers because they had to pay for the R&D.
This is a deliberate choice. We already see it with Qwen splitting their releases between open weight and hosted only models. The open weights are a loss leader to get attention. Without them you’d almost never hear about their hosted models.
What would this bet be? Training is expensive and open weights mean that for hosting you compete on price with people that don't have this item on their bill.
So far, it's really only the Chinese labs (and FAIR or whatever Meta's project is called now) that are doing this. Oh yeah, and Google's Gemma.
At the moment, this is all massively distorted by the prestige and investment money flowing into the space. None of the labs have to charge the real cost of inference let alone the marginal cost of training because they are instead lighting investment money on fire to cover that.
One imagines (though I have not investigated in detail) that there's a degree of national prestige work going on too. The Chinese labs are trying to show that they can build better and more efficient models and are releasing open to undercut the US labs.
This is a good insight. I think everyone has seen that chart China's electricity generation going parabolic vs the US. That combined with cheaper yet equally good talent means at least in that segment, the closed labs won't catch up anytime soon
Even if we all switch to Chinese models, the west isn't going to be running the model on Chinese servers... and the majority of costs are from inference.
> cheaper yet equally good talent
China has tech talent, but this isn't a 3rd world developing nation. Chinese AI researchers are getting paid $10M+ USD/year salaries.
Also they're equally good, but somehow consistently behind?
Which closed labs won’t catch up to whom?
Not to say that frontier labs won't make progress, but the bar for a sufficiently capable agent is all the OSS models need to meet to make this happen. I imagine a lot of hybrid setups where something like Opus is used only for planning/architecture, and anecdotally, the real token consuming part is implementation not architecture.
Nuclear power anyone?
Currently the projects I am involved require devs to use approaches like Ollama, Foundry Local and co if they happen to have good enough hardware, picking the best alternatives out of https://www.canirun.ai.
I feel it'll wind up like the dotcom/fiber bubble. Way too much money poured into it, lots of expensive bankruptcies or write-offs, and a readjusted market sea level.
Actually, platforms that serve many customers can bring down the costs tremendously through caching, and don’t need the AI credits as much: https://safebots.ai/costs.html
Training these neural networks every few months isn’t energy-heavy?
Both Bitcoin and these large models weren’t “designed to be energy-heavy”. It was a consequence of first-gen design decisions to solve a specific problem. Then as time went on, costs went down and they became a huge outlier in terms of energy. The question is whether the bagholders (the AI companies that invested untild amounts into the initial training) will fight to keep people using their tech and fearmonger about everything else.
Neural nets on the other hand generally show more capability as you add more compute power. There's a point where it's less valuable than the cost increase, so people don't do more than that, but it isn't constant value like Bitcoin.