Story Detail of id 48387247 | Liveview Hacker News

Just use llama.cpp or Unsloth Studio which wraps it, I don't know why anyone use Ollama anymore.

I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models

This is a good starting issue with a bunch of linked/related