Story Detail of id 47471934 | Liveview Hacker News

unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.

You can check what each model does on AMD Strix halo here: