heres some napkin math
gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.
1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)
2. you are getting 75% prompt cache reads
Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)
Input: ~633mil
~475 mil cached at 50% input pricing = ~$9.25
~158 mil uncached = ~$6.15
tokensOutput: 25mil tokens ($4.5)
This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.
its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.
I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.