Story Detail of id 48397242 | Liveview Hacker News

briansm10 hours ago | on: Gemma 4 12B: A unified, encoder-free multimodal model

Strange that they are feeding raw audio in. Even in humans, there is a hardware transform to the frequency domain (the cochlea) before data is fed to the brain, effectively doing this part in the LLM seems inefficient.

loading story #48401740

#visit	13,566,387
#session	74,665
#live-session	0