Javascript is not enabled. This site can still works but it'll be more interactive when javascript is enabled.
loading...
Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
Aurornis
5 hours ago
|
on: Flash-MoE: Running a 397B Parameter Model on a Laptop
It depends on the model and the mix. For some MoE models lately it’s been reasonably fast to offload part of the processing to CPU. The speed of the GPU still contributes a lot as long as it’s not too small of a relative portion of compute.
reply