Are you including capex when you say "cost"? Or are you just looking at inference costs?
It doesn't make sense to include the capex cost to train a model in this kind of discussion, because that cost is fixed.
Consider a model that costs $100m to train.
If the vendor then prices it such that each inference token has a margin of 10% over the variable costs to serve (power + server costs), whether or not they cover their costs is based entirely on how many tokens they can sell.
If they sell less than $1bn of tokens, they lose money - the break even point is 10x100m = $1bn.
If they sell $10bn of tokens they make a ton of money.
This also means you can't credibly calculate how much of the fixed training expense is covered by your token spend, because until the model is retired and you can account for how much inference it ran you don't know what percentage of the training cost each sold token was responsible for.
loading story #48492628
loading story #48492452