I just don’t believe that this can run inference on a 120 billion parameter model at actually useful speeds.
Obviously any Turing machine can run any size of model, so the “120B” claim doesn’t mean much - what actually matters is speed and I just don’t believe this can be speedy enough on models that my $5000 5090-based pc is too slow for and lacks enough vram for.
Look at the GPU and RAM spec; 120b seems workable.
For the red v2?
120B could run, but I wouldn't want to be the person who had to use it for anything.
To be fair, the 120B claim doesn't appear on the webpage. I don't know where it came from, other than the person who submitted this to HN
It is more than fair, also, you're comparing your 5k devices to 12k and more importantly 65k and >10m devices.
The "to be fair" part of my comment was saying that the tinygrad website doesn't claim 120B.
Also nobody is comparing this box to an $10M nVidia rack scale deployment. They're comparing it to putting all of the same parts into their Newegg basket and putting it together themself.