But yeah, 4x Blackwell 6000s are ~32-36k, not sure where the other $30k is going.
A 120B model cannot fit on 4 x 24GB GPUs at full quantization.
Either you're confusing this with the 20B model, or you have 48GB modded 3090s.
EDIT: Either they edited that to say "quad 3090s", or I just missed it the first time.
check out what other people are getting. you're welcome.
https://www.reddit.com/r/LocalLLaMA/comments/1nunq7s/gptoss1... https://www.reddit.com/r/LocalLLaMA/comments/1p4evyr/most_ec...
I was considering picking up a couple of the 48 gig 4090/3090s on an upcoming trip to China, but I just ended up getting one of the Max-Q's. But maybe the token throughput would still be higher with the 4090 route? Impressive numbers with those 3090s!
What's the rig look like that's hosting all that?