Story Detail of id 47471283 | Liveview Hacker News

oceanplexian1 day ago | on: Tinybox – A powerful computer for deep learning

It will work fine but it’s not necessarily insane performance. I can run a q4 of gpt-oss-120b on my Epyc Milan box that has similar specs and get something like 30-50 Tok/sec by splitting it across RAM and GPU.

The thing that’s less useful is the 64G VRAM/128G System RAM config, even the large MoE models only need 20B for the router, the rest of the VRAM is essentially wasted (Mixing experts between VRAM and/System RAM has basically no performance benefit).

loading story #47477305

loading story #47477306

syntaxing23 hours ago | parent

Split RAM and GPU impacts it more than you think. I would be surprised if the red box doesn’t outperform you by 2-3X for both PP and TG

#visit	13,229,154
#session	74,665
#live-session	0