Javascript is not enabled. This site can still works but it'll be more interactive when javascript is enabled.
loading...
Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
zozbot234
19 hours ago
|
on: Tinybox – A powerful computer for deep learning
MoE layers offload to CPU inference is the easiest way, though a bit of a drag on performance
reply
ericd
19 hours ago
|
parent
Yeah, I'd just be pretty surprised if they were getting 100 tokens/sec that way.
EDIT: Either they edited that to say "quad 3090s", or I just missed it the first time.
reply
loading story #47477624