Hacker News new | past | comments | ask | show | jobs | submit
What quantisation do the creators intend this to be run at? They talk about 16GB of ram, so should it be run at 8 bit? People here are talking about using q4, but I would have thought a smaller model like this wouldn't perform well at such low bits per parameter. Edit, it looks like their bechmarks would have been done at 16 bit float, as the hugging face release is that size: https://huggingface.co/google/gemma-4-12B . Which is a little deceptive: they're advertising an 8 bit size will fit on 16GB laptops, while releasing a 16bit size.

I guess we have to wait for someone to produce perplexity curves at different Q's.

They haven't made one for this new model, but Unsloth has a comprehensive quant KLD map of Gemma 4 26B A4B here: https://3215535692-files.gitbook.io/~/files/v0/b/gitbook-x-p...