Consider a model that costs $100m to train.
If the vendor then prices it such that each inference token has a margin of 10% over the variable costs to serve (power + server costs), whether or not they cover their costs is based entirely on how many tokens they can sell.
If they sell less than $1bn of tokens, they lose money - the break even point is 10x100m = $1bn.
If they sell $10bn of tokens they make a ton of money.
This also means you can't credibly calculate how much of the fixed training expense is covered by your token spend, because until the model is retired and you can account for how much inference it ran you don't know what percentage of the training cost each sold token was responsible for.
You have to include also failed training sessions and experiments in the math.
There are no official figures but given how fast new models are rolled out, I wouldn't be surprised if neither Anthropic nor OAI manage to cover the full models cost.
And if capabilities plateau such that training the next one is useless, then the margins will drop fast due to competition.