Story Detail of id 47477468 | Liveview Hacker News

Some of you folks on here love to argue, gpt-oss-120b was trained in 4 bits, so it pretty much takes up 60gb.

Good point, but you still need KV cache and more. Fitting the model alone to RAM doesn’t get the job done.

Yeah, it doesn't take much. I'm looking at it right now, KV cache is about 4gb of vram, compute buffer =~ 1.5gb at full 128k context.