Hacker News new | past | comments | ask | show | jobs | submit
Unfortunately there's no gguf quants of the assistant model yet: https://huggingface.co/models?other=base_model:quantized:goo...
I think MTP Gemma4 support is still WIP https://github.com/ggml-org/llama.cpp/pull/23398 ?
This has been my impression.

The underlying LiteRT-LM framework used in the edge gallery does support the MTP drafters for the smaller models, but according to:

https://developers.google.com/edge/litert-lm/models/gemma-4

> Note: LiteRT-LM supports E2B and E4B models today, with support for larger models coming soon.

So even Google aren't shipping MTP support for the 26B and 31B models yet.