I've always found the Gemma models to vastly under-perform on vision tasks compared to Qwen so that's nothing new.
The Qwen series adopted vision wayyy earlier than anyone else. No idea why the other labs were sleeping on it but they had about 2 years of experimentation without any competition.