Hacker News new | past | comments | ask | show | jobs | submit
Strange that they are feeding raw audio in. Even in humans, there is a hardware transform to the frequency domain (the cochlea) before data is fed to the brain, effectively doing this part in the LLM seems inefficient.
loading story #48401740