Hacker News new | past | comments | ask | show | jobs | submit
i mean this is difficult to calculate because of prompt cacheing, the ratio of input/output token etc, but if you just do some napkin math, i find it hard to believe people are getting this many tokens on a $20 plan.

heres some napkin math

gpt oss 120b is in/out price at 0.039/ 0.18 per million on open router. heres some assumptions.

1. the ratio of input/ouput is about 25/1. (coding is mostly grep and fairly low outpu)

2. you are getting 75% prompt cache reads

Case B: 50% Prompt Caching Discount (Standard Provider Rate)At 75% Prompt Caching:Total Tokens Obtained: 658,749,010 (approx. 659 Million tokens)

Input: ~633mil

~475 mil cached at 50% input pricing = ~$9.25

~158 mil uncached = ~$6.15

tokensOutput: 25mil tokens ($4.5)

This doesnt even account for profit margins on inference providers, or the fact that openAI probably has a much more efficient inference stack.

its really hard to know what these companies are actually paying, but from everything im hearing, people are reporting API inference pricing is 50% margin.

I didn't say "use openrouter" as you might end using subsidized resources, part of the argument is to avoid that and reach the true capital cost of inference per token (or something like that).

I meant, buy/lease the hardware that lets you run this model, run gpt-oss-120b and measure. I did this once and it was like 10x more expensive than any hosted alternative, and $20 wouldn't get you far there.

loading story #48495240
loading story #48495435