Hacker News new | past | comments | ask | show | jobs | submit
Since ollama has diverged from llama.cpp, it will take a bit of time for ollama to support multi-modality. If you're using plain llama.cpp it looks like a PR has already merged for this model with vision and audio support:

https://github.com/ggml-org/llama.cpp/pull/24077

They've actually gone back to (a lightly patched) llama.cpp with the 0.30 release a few weeks ago, and have now vendored-in an up to date release. Needless to say this is great news for both projects!