Story Detail of id 48388489 | Liveview Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

verdverm1 day ago | on: Gemma 4 12B: A unified, encoder-free multimodal model

I switched from llama.cpp to vLLM because of prompt cache bugs in qwen/gemma models

This is a good starting issue with a bunch of linked/related

https://github.com/ggml-org/llama.cpp/issues/22746