Yeah, I'd just be pretty surprised if they were getting 100 tokens/sec that way.
EDIT: Either they edited that to say "quad 3090s", or I just missed it the first time.
you are correct, I did forget to add quad. you should join us in r/localllama
check out what other people are getting. you're welcome.
https://www.reddit.com/r/LocalLLaMA/comments/1nunq7s/gptoss1... https://www.reddit.com/r/LocalLLaMA/comments/1p4evyr/most_ec...
Thanks for the confirmation, wasn't sure if I was just going a bit senile heh. Yeah, I love /r/localllama, some of the best actual practitioners of this stuff on the internet. Also, crazy awesome frankenrigs to try and get that many huge cards working together.
I was considering picking up a couple of the 48 gig 4090/3090s on an upcoming trip to China, but I just ended up getting one of the Max-Q's. But maybe the token throughput would still be higher with the 4090 route? Impressive numbers with those 3090s!
What's the rig look like that's hosting all that?