an interesting thread with speculation about how to eventually do this on local TPUs with llama.cpp and GGUF infrastructure: https://www.reddit.com/r/LocalLLaMA/comments/12o96hf/has_any...
That’s not happening. The coral edge tpus are ancient, slow and don’t have enough mem to be meaningful and somehow still manage to be relatively expensive even 2nd hand.
They have some good uses but LLMs aint it
loading story #41516696
Ahh, the reddit thread is referring to edge TPU devices, will check it out.
Google also has Cloud TPUs, which are their server-side accelerators, and this is what we are initially trying to build for!