Another thing is that unless you are doing really complicated stuff, you probably don't need the latest models running on high. I'm still on 5.4 medium with codex. I see very little reason to change that.
Part of agentic engineering is figuring out how to be economical with tokens and time. You can sacrifice one for the other of course. But there are diminishing returns as well where spending 10x more doesn't actually get you 10x more quality/results.
If I had to pay per token, I'd probably look at DeepSeek. In general it feels like it's a bit early for the technology - either our software methods are wasteful, or the hardware hasn't caught up. To me, it appears that we often need to throw more tokens at these problems, not less, since otherwise it's just one-shot slop.