edit: Found your comment about /r/localllama, but if you have anything more to add I'm still very interested.
A 120B model cannot fit on 4 x 24GB GPUs at full quantization.
Either you're confusing this with the 20B model, or you have 48GB modded 3090s.
seg@seg-epyc:~/models$ du -sh * /llmzoo/models/* | sort -n
4.0K metrics.txt
4.0K opus
4.0K start_llama
8.2G nvidia_Orchestrator-8B-Q8_0.gguf
12K config.ini
34G Qwen3.5-27B
47G Qwen3.5-35B
51G Qwen3.5-27B-BF16
61G gpt-oss-120b-F16.gguf
65G Qwen3.5-35B-BF16
106G Qwen3.5-122B-Q6
117G GLM4.6V
175G MiniMax-M2.5
232G /llmzoo/models/small_models
240G Ernie4.5-300B
377G DeepSeekv3.2-nolight
380G /llmzoo/models/DeepSeek-V3.2-UD
400G /llmzoo/models/Qwen3.5-397B-Q8
424G /llmzoo/models/KimiK2Thinking
443G DeepSeek-Math-v2
443G DeepSeek-V3-0324-Q5
500G /llmzoo/models/GLM5-Q5
546G /llmzoo/models/KimiK2.5EDIT: Either they edited that to say "quad 3090s", or I just missed it the first time.
check out what other people are getting. you're welcome.
https://www.reddit.com/r/LocalLLaMA/comments/1nunq7s/gptoss1... https://www.reddit.com/r/LocalLLaMA/comments/1p4evyr/most_ec...
I was considering picking up a couple of the 48 gig 4090/3090s on an upcoming trip to China, but I just ended up getting one of the Max-Q's. But maybe the token throughput would still be higher with the 4090 route? Impressive numbers with those 3090s!
What's the rig look like that's hosting all that?