Story Detail of id 48387695 | Liveview Hacker News

senko1 day ago | on: Gemma 4 12B: A unified, encoder-free multimodal model

I ran the Q4 quant (used with llama.cpp) though my "minesweeper" vibe-coding benchmark: https://senko.net/vibecode-bench/2026/minesweeper-gamma-4-12...

The result is decent, but it had a few bizzare/trivial syntax errors I had to fix manually: it would do an extra closing bracket or paren a few times, and wanted to separate function definitions with comma. Not sure what that was about, but otherwise the output run just fine.

So, with those qualifiers, I think it's a decent local coding model. It roughly compares with GPT-4.1 (!!), released 14 months ago, on the output: https://senko.net/vibecode-bench/2025/minesweeper-gpt-4.1.ht... (actually I'd call it better, but those syntax errors...)

I ran the quantized version (4-bit GGUF) on my consumer-grade card with 12G of VRAM and got 5t/s for output. Not for interactive use for coding, but fairly capable model.

To me, it's fascinating how much progress we got in over a year. GPT-4.1 was considered an extremely capable coding model. Now we got something with 12B of params performing roughly the same (in this specific benchmark, disclaimers, etc).

Lists of various models I tested: https://senko.net/vibecode-bench/

0xbadcafebee1 day ago | parent | next

It was almost certainly not trained for coding, as it's got both audio and vision input, is only 12B, and nowhere in the announcement is coding mentioned. It will likely not have good performance on coding in general, compared to other small models like Qwen 3.6 35B A3B, Gemma 4 26B A4B, Nvidia Nemotron 3 Nano 30B-A3B, gpt-oss-20b.

For 16GB laptops, Qwen 3.5 9B is the undisputed champ.

Gemma 4 31B is the top dog at small model coding, but is dense so it needs ~48GB unified RAM for full context. If you want decent coding on a laptop you need a lot of RAM. But this shouldn't be surprising, dev machines have always needed lots of resources.

dirkg16 hours ago | root | parent | next

> For 16GB laptops, Qwen 3.5 9B is the undisputed champ.

you can run qwen 3.6 35BA3B on a 12-16GB vram gpu and ot works pretty well.

https://www.youtube.com/watch?v=8F_5pdcD3HY&t=1s

even the 27B in some quants can fit.

https://www.reddit.com/r/LocalLLaMA/comments/1tkmgwj/qwen27b...

qwen IMO is far better for coding, esp agentic coding when combined with something like Pi, it comes probably close enough to Sonnet for a lot of use cases.

Gemma family is better for almost all other tasks you'd use a local llm for.

ricardobayes8 hours ago | root | parent | next

You can run it, however those low quantized models (iQ2, iQ4, Q2) will very likely underperform the 9B versions at Q6/Q8.

loading story #48404921

selicos5 hours ago | root | parent | next

I want to try a hybrid setup of Gemma 4 E4B with lots of context for general, then Qwen 3.5 9B or larger for coding. Strix Halo set up this weekend, which may enable even larger Qwen models with tons of context.

dofm7 hours ago | root | parent

The larger Gemma models are quite good at PHP. I would not be surprised if that was a training objective — it's one of the more consumer-focussed programming languages. They have very good knowledge of wordpress hooks.

dotancohen22 hours ago | root | parent | next

  > For 16GB laptops, Qwen 3.5 9B is the undisputed champ.

You seem like the guy to ask. For a laptop with 12GB VRAM (RTX 5070) and 32 GB system RAM, what is a good multilingual (English, Hebrew, Greek) model for conversing with personal notes in Org mode format? I don't care how long updating the model or rag takes, and even inference can be reasonably slow, but the results of the query as they relate to my personal notes are important. I don't care about general knowledge, for those questions I can use e.g. ChatGPT.

Thanks

akmarinov16 hours ago | root | parent | next

Joins us over on Reddit at r/LocalLlaMA to get 10 different opinions on that

loading story #48396337

loading story #48395144

nl10 hours ago | root | parent | next

Qwen 3.5 35B A3

Qwen models are always good. The 35B A3 model is a MoE model which means it has higher performance in RAM constrained environments compared to the 27B dense model (which is better at coding).

I don't have experience to rate it's Hebrew or Greek performance but apparently it's not bad.

sourcecodeplz19 hours ago | root | parent | next

Any Gemma 4 model, they are great at translations, multilingual

silversmith16 hours ago | root | parent | next

For the biggest languages, Spanish, French, maybe.

For smaller ones like my native Latvian, the output could be confused for good translation from across the room, the words do look like Latvian words. But the quality is Google translate circa 20 years ago, tops.

It could probably do a decent enough translation to English, if all you need is to get the gist of text. But for smaller European language outputs, nothing comes close to Gemini.

loading story #48395771

emmelaich18 hours ago | root | parent | next

#visit	13,567,922
#session	74,665
#live-session	0