The era of open voice assistants
https://www.home-assistant.io/blog/2024/12/19/voice-preview-edition-the-era-of-open-voice/I noticed recently there weren't any good open source hardware projects for voice assistants with a focus on privacy. There's another project I've been thinking about where I think the privacy aspect is Important, and figuring out a good hardware stack has been a Process. The project I want to work on isn't exactly a voice assistant, but same ultimate hardware requirements
Something I'm kinda curious about: it sounds like they're planning on a sorta batch manufacturing by resellers type of model. Which I guess is pretty standard for hardware sales. But why not do a sorta "group buy" approach? I guess there's nothing stopping it from happening in conjunction
I've had an idea floating around for a site that enables group buys for open source hardware (or 3d printed items), that also acts like or integrates with github wrt forking/remixing
Same for their voice assistant. You can but their hardware and get started right away or you can place your own mics and speakers around home and it will still work. You can but your own beefy hardware and run your own LLM.
The possibilities with home assistant are endless. Thanks to this community for breaking the barriers created by big tech
I cannot wait to buy 5 or more of these to replace Alexa. HA is the brain of my house and up till now Alexa provided the best hardware to interact with HA (IMHO) but I'd love something first-party.
At various times in the past, the teams involved in such projects have at least prototyped extremely invasive features with those in-home devices. For example, one engineer I've visited with from a well-known in-home device manufacturer worked on classifiers that could distinguish between two people having sex and one person attacking another in audio captured passively by the microphones.
As the corporate culture and leadership shifts over time I have marginal confidence that these prototypes will perpetually remain undeveloped or on-device only. Apple, for instance, has decided to send a significant amount of personal data to their "Private Cloud" and is taking the tactic of opening "enough" if its infrastructure for third-party audit to make an argument that the data they collect will only be used in a way that the user is aware and approves of. Maybe Apple can get something like that to a good enough state, at least for a time. However, they're inevitably normalizing the practice. I wonder how many competitors will be as equally disciplined in their implementations.
So my takeaway is this: If there exists a pathway between a microphone and the Internet that you are not in 100% control over, it's not at all unreasonable to expect that anything and everything that microphone picks up at any time will be captured and stored by someone else. What happens with that audio will -- in general -- be kept out of your knowledge and control so long as there is insufficient regulatory oversight.
By "I don't fully understand," I mean just that. There's a lot of marketing copy, but there's a lot I'd like to understand better before plopping down $$$ for a unit. The answers might be reasonable.
Ideally, I'd be able to experiment with a headset first, and if it works well, upgrade to the $59 unit.
I'd love to just have a README, with a getting started tutorial, play, and then upgrade if it does what I want.
Again: None of this is a complaint. I assume much of this is coming once we're past preview addition, or is perhaps there and my search skills are failing me.
An actually good product in this space IMO needs to be able to define specific sets of actions and allow agents to perform only the permitted actions.
The respeaker has 4 mics and can easily cancel out the noise introduced by a custom external speaker
They recommend an N100 in the blog post, but I might buy one anyway to see if my HA box's Celeron J3455 will do the job.
I really hope we see some open-source machine -learned systems emerge.
I saw Insta360 announce their video conferencing solution today. Optics looks pretty medium, nothing wild, but Insta360 is so good at video that I expect it'll be great. But there's a huge 14 microphone array on it, and that's the hard job; figuring out how to get good audio from speakers in a variety of locations around a room. It really made me wish for more open source footing here, some promising start, be it the conference room or open living space. I've given all of 60s to look through this, and was kinda hopeful because heck yeah Home Assistant, but my initial read isn't super promising, isn't that this is starting the proper software base needed to listen well to the world.
https://petapixel.com/2024/12/17/the-insta360-connect-is-a-2...
I’m assuming they eventually want to create their own LLM and something privacy focused would be good match for their customers. I don’t know how they feel about open source though
Llama and whisper are already public so that should help innovation in this area.
This makes sense for cars, where there's much local stuff to control. But for a home unit, what do you want to do that is entirely local? Turning the heat up and down gets boring after a while. If it does entertainment selection or shopping, it needs outside world connections.
(Today's rant: I recently purchased a humidifier. It's just a little unit with a water tank, a water-softening filter, and an ultrasonic vaporizer. That part works fine. Then there are the controls.
All this thing really needs is an on-off switch and a humidity knob, and maybe lights for power, humidification, and water tank empty. But no. It has five touch buttons and a round display about four inches across. The display is on even if the unit is off. Pressing the on/off button turns it on. If it's humidifying, there's a whole light show. The tank lights up purple. Swooping arcs of blue run up both edges of the round display. It's very impressive, especially in a dark bedroom. If you press and hold the second button for two seconds, about half the light show is suppressed.
There are three fan speeds, and a button for that. Only the highest one will propel the water vapor high enough to avoid it hitting the floor and uselessly condensing before it mixes with the air. So that feature was not necessary.
The display shows one number. It's usually the current humidity, but if you press the humidity set button, the number displayed becomes the setting, which is changed upwards by successive presses until it wraps around. After a few seconds, the display reverts to current humidity.
Turning the unit off or removing the water tank resets all settings to the default.
This is the low-end unit. The next step up comes with an IR remote. It's one way - the remote has buttons but no display. Since you have to be close to the display to use the buttons effectively, that doesn't help much. The step up after that is, inevitably, a cloud-based phone app.
So this thing could potentially be interfaced to a voice assistant. That's only useful if there's enough information coming back from the device that the assistant software knows what the device is doing, and the assistant software understands that device status. If all it does is send remote button pushes, the result will be frustration.
So you need some degree of intelligence at both ends - the end that talks to the human, and the end that talks to the device. If the user says "House, it's too dry in here", the assistant system needs to be able to check the status of the humidifier. Has power? Talking? On? Humidity setting reasonable? Fan running? Tank not empty? If it can't do that, it's part of the problem, not part of the solution.)
- Full privacy: nothing goes to the "cloud"
- Non-shitty microphones and processing: i want to be able to be heard without having to yell, repeat, or correct
- No wake words: it should listen to everything, process it, and understand when it's being addressed. Since everything is private and local, this is now doable
- Conversational: it should understand when I finished talking, have ability to be interrupted, all with low latency
- Non-stupid: it's 2024, and alexa and siri and google are somehow absolutely abysmal at doing even the basics
- Complete: i don't want to use an app to get stuff configured. I want everything to be controlled via voice
And if not, I would be curious to know why it haven't been fully open sourced.
The first thought I had when encountering LLM was that it can finally make these devices understand you and make them finally useful... and I don't need to know some presceipted keywords.
My only remaining wish is that I can replace Siri with this (without needing some workaround)
A 3090 is too expensive and power hungry. Maybe a 3060 12Gb? Is there anything in the "workstation" lineup that is more efficient especially since I don't need the video outs?
Is this a fully-private, open source alternative to Alexa, that by definition requires a CPU locally to run ?
Is the device supposed to be the nerve center of IoT devices ?
Can it access the Wifi to do web crawls on command (music, google, etc)?