Story Detail of id 47476883 | Liveview Hacker News

pdyc8 hours ago | on: Flash-MoE: Running a 397B Parameter Model on a Laptop

impressive, i wish someone takes a stab at using this technique on mobile gpu's even if it does not use storage it would still be a win. I am running llama.cpp on adreno 830 with oepncl and i am getting pathetic 2-3t/s for output tokens

#visit	13,229,592
#session	74,665
#live-session	0