Hacker News new | past | comments | ask | show | jobs | submit

Show HN: Gemma 4 Multimodal Fine-Tuner for Apple Silicon

https://github.com/mattmireles/gemma-tuner-multimodal
I run whisper large-v3 on an m2 max 96gb and even with just inference the memory gets tight on longer audio, can only imagine what fine-tuning looks like. Does the 64gb vs 96gb make a meaningful difference for gemma 4 fine-tuning or does it just push the oom wall back a bit? Been wanting to try local fine-tuning on apple silicon but the tooling gap has kept me on inference only so far.
loading story #47682913
loading story #47680960
loading story #47683078
loading story #47687221
I’m pretty excited about the edge gallery ios app with gemma 4 on it but it seems like they hobbled it, not giving access to intents and you have to write custom plugins for web search, etc. Does anyone have a favorite way to run these usefully? ChatMCP works pretty well but only supports models via api.
Nice! I've been wanting to try local audio fine-tuning. Hopefully it works with music vocals too
> I had 15,000 hours of audio data

do you really need that much data for fine-tuning?

loading story #47683899
Just a heads up, that I found NVIDIA Parakeet to be way better than Whisper - faster, uses less compute, the output is better, and there are more options for the output. I am using parakeet-mlx from the command line. Check it out!
Thanks for doing this. Looks interesting, I'm going to check it out soon.
loading story #47680964
This is super cool, will definitely try it out! Nice work