Hacker News new | past | comments | ask | show | jobs | submit
This is literally what talaas has done with chatjimmy.ai.

Try it, it's llama 3.1 8B at 16000 tokens per second.

chatjimmy.ai https://taalas.com/the-path-to-ubiquitous-ai/

Wow that incredibly fast. I like this outcome more than centralized datacenters.