Story Detail of id 48388742 | Liveview Hacker News

goobatrooba1 day ago | on: Gemma 4 12B: A unified, encoder-free multimodal model

Either Google changed the text or you editorialised it a tiny bit - just for all others that got excited, they mean 16GB VRAM. So a premium graphics card requiring a >2500€ device is the minimum to run this.

Still progress, but not quite democratic yet.

Weird though that Google might be cannibalising it's own AI subscription service?

loading story #48404553

LoveMortuus23 hours ago | parent | next

I've bought a laptop for <1500€ that came with 32GB of RAM and an RTX 3080 with 16GB or VRAM. So I don't think >2500€ device is necessary, though I'm certain it would yield better and faster results.

spider-mario4 hours ago | parent | next

Or a MacBook Air with unified memory?

thot_experiment1 day ago | parent | next

I haven't tried this model yet, but I can run Gemma 31B w/ the MTP drafter in pure CPU at about 10tok/s so this should run at about 20-30tok/s on a decent CPU, it'll probably run at >50tok/s on any Mac that can fit it, and lots of people have a gaming GPU with enough VRAM. In terms of access to hardware being a gate, it's one you can hop pretty easily.

dofm1 day ago | root | parent

Could you outline how you are running the MTP drafters? I've tried LM Studio but no dice there. I'm probably missing something but I think llama.cpp and Ollama can't do it yet either?

thot_experiment1 day ago | root | parent | next

I just build llama.cpp from scratch on the PR that has MTP drafters.

https://github.com/ggml-org/llama.cpp/pull/23398

Please don't use Ollama, it's a bad actor in the OSS community.

dofm1 day ago | root | parent

I don't have the energy to build stuff all the time, that's a rabbit-hole side tunnel I don't really want to get into. I have larger concerns in my life that are more urgent than developing that side of things.

But I've moved on from Ollama for the time being, though I am mainly interested to see what the Gemma 4 MTP speeds are like on my M1 Max, so I may test it.

I am quite impressed with the tools in LM Studio, which is also a beautiful app, but it is not open source (which challenges my personal strategy somewhat) and I dread its inevitable enshittification.

Nevertheless the GUI has been very helpful while I learn, and I will probably use it until something else presents or my usage pattern settles down from experimentation to something a bit more routine.

I will try oMLX, too, but judging by the LiteRT page I may soon be able to just use that for the larger models if I end up settling with Gemma 4.

loading story #48390651

Patrick_Devine1 day ago | root | parent | next

I haven't yet pushed the MTP enabled gemma4 12b model for Ollama because in my testing I wasn't getting a performance bump. The other gemma4 MTP models should work OK right now, but there are some fixes we're just about to push. This is specifically for the MLX backend.

dofm1 day ago | root | parent

Thanks for your reply. I will go back and look at Ollama again.

So much to learn but this news has really vindicated my decision to direct my limited span of concentration and focus to learning how to use open weights models and opencode.

ch_sm1 day ago | root | parent

can‘t speak to compatibility with this new model, but oMLX supports MTP drafters very well.

dofm1 day ago | root | parent

Thank you, I will test that.

ActorNightly1 day ago | parent

Google is an advertising company first and foremost. At some point, these local models have to fit into that umbrella. I don't quite know how yet, but its going to happen.

That being said, the real value in paid plans is that you get ecosystem integration that can read your gmail, photos, docs, and so on.

bitexploder23 hours ago | root | parent | next

Google is also a Cloud Provider. Cloud is now ~18% of Google. While it is an advertising juggernaut. Cloud is also rapidly growing, so the local models simply fit as AI research and dev and getting more people on Gemini models. They /are/ advertising, effectively :)

loading story #48392780

jpadkins23 hours ago | root | parent

local models still need information retrieval.

#visit	13,567,840
#session	74,665
#live-session	0