Hacker News new | past | comments | ask | show | jobs | submit
unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.

You can check what each model does on AMD Strix halo here:

https://kyuz0.github.io/amd-strix-halo-toolboxes/