Hacker News new | past | comments | ask | show | jobs | submit
Quickly deployed it to check some benchmarks relevant for German language. These are results for CohereLabs/include-base-44 german only : Gemma 4 12B %61.9

  Gemma 4 26B (a4b MoE)    0.647
  Qwen 3 14B               0.621 
  Gemma 4 12B              0.618
  Ministral 14B 2512       0.604 
  Gemma 3 12B              0.547
The quwen 3 14B vs Gemma 4 12B difference is within random variance they same in some repeat runs they actually got the exact same score. Next step up Gemma 4 31B gets 0.676 on this. Or let in some reasoning Qwen 3 14B (reasoning) 0.676.

I'll run some cheat-proof benchmarks ones tomorrow see if qwen is still on top.

I just ran a short tool use test and it's doing pretty well.