Hacker News new | past | comments | ask | show | jobs | submit
Some of you folks on here love to argue, gpt-oss-120b was trained in 4 bits, so it pretty much takes up 60gb.
Good point, but you still need KV cache and more. Fitting the model alone to RAM doesn’t get the job done.
Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context.