The era of open voice assistants

https://www.home-assistant.io/blog/2024/12/19/voice-preview-edition-the-era-of-open-voice/

929_Microft | 1 week ago | 276 | HN

I'm actually really excited for this!

I noticed recently there weren't any good open source hardware projects for voice assistants with a focus on privacy. There's another project I've been thinking about where I think the privacy aspect is Important, and figuring out a good hardware stack has been a Process. The project I want to work on isn't exactly a voice assistant, but same ultimate hardware requirements

Something I'm kinda curious about: it sounds like they're planning on a sorta batch manufacturing by resellers type of model. Which I guess is pretty standard for hardware sales. But why not do a sorta "group buy" approach? I guess there's nothing stopping it from happening in conjunction

I've had an idea floating around for a site that enables group buys for open source hardware (or 3d printed items), that also acts like or integrates with github wrt forking/remixing

loading story #42470457

loading story #42468436

loading story #42468413

loading story #42468945

loading story #42469600

thumbsup-_-1 week ago | parent | next

We need more projects like home assistant. I started using it recently and was amazed. They sell their own hardware but the whole setup is designed to works on any other hardware. There are detailed docs for installation on your own hardware. And, it works amazingly well.

Same for their voice assistant. You can but their hardware and get started right away or you can place your own mics and speakers around home and it will still work. You can but your own beefy hardware and run your own LLM.

The possibilities with home assistant are endless. Thanks to this community for breaking the barriers created by big tech

loading story #42468245

loading story #42468600

loading story #42470197

loading story #42468963

joshstrange1 week ago | parent | next

It's too bad it's sold out everywhere. I've tried the ESP32 projects (little cube guy) for voice assistants in HA but it's mic/speaker weren't good enough. When it did hear me (and I heard it) it did an amazing job. For the first time I talked to a voice assistant that understood "Turn off office lights" to mean "Turn off all the lights in the office" without me giving it any special grouping (like I have to do in Alexa and then it randomly breaks). It handled a ton of requests that are easy for any human but Alexa/Siri trip up on.

I cannot wait to buy 5 or more of these to replace Alexa. HA is the brain of my house and up till now Alexa provided the best hardware to interact with HA (IMHO) but I'd love something first-party.

loading story #42471540

loading story #42469165

steelframe1 week ago | parent | next

If it's possible for the hardware to facilitate a use case, the employees working on the product will try to push the limits as far as they possibly can in order to manufacture interesting and challenging problems that will get them higher performance ratings and promotions. They will rationalize away privacy violations by appealing to their "good intentions" and their amazing ability to protect information from nefarious actors. In their minds they are working for "the good guys" who will surely "do the right thing."

At various times in the past, the teams involved in such projects have at least prototyped extremely invasive features with those in-home devices. For example, one engineer I've visited with from a well-known in-home device manufacturer worked on classifiers that could distinguish between two people having sex and one person attacking another in audio captured passively by the microphones.

As the corporate culture and leadership shifts over time I have marginal confidence that these prototypes will perpetually remain undeveloped or on-device only. Apple, for instance, has decided to send a significant amount of personal data to their "Private Cloud" and is taking the tactic of opening "enough" if its infrastructure for third-party audit to make an argument that the data they collect will only be used in a way that the user is aware and approves of. Maybe Apple can get something like that to a good enough state, at least for a time. However, they're inevitably normalizing the practice. I wonder how many competitors will be as equally disciplined in their implementations.

So my takeaway is this: If there exists a pathway between a microphone and the Internet that you are not in 100% control over, it's not at all unreasonable to expect that anything and everything that microphone picks up at any time will be captured and stored by someone else. What happens with that audio will -- in general -- be kept out of your knowledge and control so long as there is insufficient regulatory oversight.

loading story #42474102

jfim1 week ago | parent | next

That's a pretty timely release considering Alexa and the Google assistant devices seem to have plateaued or are on the decline.

loading story #42468486

frognumber1 week ago | parent | next

I don't fully understand the cloud upsell. I have a beefy GPU. I would like to run the "more advanced" models locally.

By "I don't fully understand," I mean just that. There's a lot of marketing copy, but there's a lot I'd like to understand better before plopping down $$$ for a unit. The answers might be reasonable.

Ideally, I'd be able to experiment with a headset first, and if it works well, upgrade to the $59 unit.

I'd love to just have a README, with a getting started tutorial, play, and then upgrade if it does what I want.

Again: None of this is a complaint. I assume much of this is coming once we're past preview addition, or is perhaps there and my search skills are failing me.

loading story #42468341

loading story #42468158

loading story #42469791

loading story #42468247

loading story #42468230

Havoc1 week ago | parent | next

Had to laugh a bit at the caveat about powerful hardware. Was bracing myself for GPU and then it says N100 lol

loading story #42469595

amluto1 week ago | parent | next

One thing that makes me nervous: Home Assistant has an extremely weak security model. There is recent support for admin users, and that’s about it. I’m sort of okay with the users on an installation having effectively unrestricted access to all entities and actions. I’m much less okay with an LLM having this sort of access.

An actually good product in this space IMO needs to be able to define specific sets of actions and allow agents to perform only the permitted actions.

loading story #42490181

fons1 week ago | parent | next

I wonder how this compares to the Respeaker 2 https://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/

The respeaker has 4 mics and can easily cancel out the noise introduced by a custom external speaker

loading story #42469951

loading story #42469642

IshKebab1 week ago | parent | next

Looks great! The biggest issue I see is music. 90% of my use is "play some music" but none of the major streaming music providers offer APIs for obvious reasons. I'm not sure how you can get around that really.

loading story #42473531

hamilyon21 week ago | parent | next

I had great trouble simply connecting Bluetooth speaker to use it as voice input and for sound output. The overall state of sound subsystem for diy voice assistant feels third-class at best.

shaklee31 week ago | parent | next

As someone not that familiar with haas, can someone explain why there's not a clear path to replace Alexa or Google home? I considered using haas recently to get a gpt like response after being frustrated with Google home, but it seems this is a complete mess. is there a way to get this yet?

loading story #42468562

fx19941 week ago | parent | next

What I don't like is that most voice assistances perform really bad on my native language so I don't use them at all. For english speakers yes, but for all other not so much. I guess it will get better.

loading story #42469818

ryukoposting1 week ago | parent | next

My wife and I have been very happy with Home Assistant so far. The one thing we're missing is voice control, and until now it seemed like there just wasn't a clean solution for HA voice control. You were stuck doing some hobbyist shenanigans and hand-writing boatloads of YAML, or you were hooking up a HomeKit/Alexa which defeats the purpose of HA. This is a game-changer.

They recommend an N100 in the blog post, but I might buy one anyway to see if my HA box's Celeron J3455 will do the job.

jauntywundrkind1 week ago | parent | next

Not super convinced the XMOS audio processing chip is really gonna buy a lot. Trying to do audio input processing feels like a dynamic task, requiring such adaption. XMOS is the most well known audio processor and a beast, but not sure it's really gonna help here!

I really hope we see some open-source machine -learned systems emerge.

I saw Insta360 announce their video conferencing solution today. Optics looks pretty medium, nothing wild, but Insta360 is so good at video that I expect it'll be great. But there's a huge 14 microphone array on it, and that's the hard job; figuring out how to get good audio from speakers in a variety of locations around a room. It really made me wish for more open source footing here, some promising start, be it the conference room or open living space. I've given all of 60s to look through this, and was kinda hopeful because heck yeah Home Assistant, but my initial read isn't super promising, isn't that this is starting the proper software base needed to listen well to the world.

https://petapixel.com/2024/12/17/the-insta360-connect-is-a-2...

loading story #42470179

hoppp1 week ago | parent | next

If it runs fully on premise that would be great. Im still not comfortable buying a device that records everything I say and uploads it to a cloud

loading story #42468773

nailer1 week ago | parent | next

You should talk to Sonos about partnering with them. They currently have a very limited Sonos voice assist, plus Google Voice and Alexa, but the latter two are limited pre-LLM assistants.

I’m assuming they eventually want to create their own LLM and something privacy focused would be good match for their customers. I don’t know how they feel about open source though

loading story #42479407

bradly1 week ago | parent | next

Are there any MacOS software versions of this? I've been looking for opensource wake-work for a "Hey Siri"-like integration, but I'm very apprehensive of anything, malicious or not, monitoring the sound input for a specific word in an efficient way.

loading story #42474147

ragmondo1 week ago | parent | next

My plea / request : Make a home assistant a DROP IN replacement for a standard light switch. It has power, its adds functionality from the get-go (smart lighting), it’s placed in a convenient position for the room and no extra wires etc required.

loading story #42472237

loading story #42471316

loading story #42471044

mkagenius1 week ago | parent | next

Though a separate hardware helps - I believe voice and automation can be integrated more seamlessly to our existing devices (phones/laptops) with high compute built in.

Llama and whisper are already public so that should help innovation in this area.

loading story #42468570

loading story #42472504

loading story #42468293

loading story #42468415

zbrozek1 week ago | parent | next

Is anyone aware of an effort to repurpose Echo hardware to do HA voice?

loading story #42474741

tomqueue1 week ago | parent | next

I am very excited for this. One question I couldn’t find an answer for though is whether the hardware is open enough to be usable with other home automation systems. I am using OpenHAB and they too have an integrated voice assistant. I looked into migrating to HA a couple times but eventually gave up, primarily because it felt like such a waste of time to migrate a fully working environment with dozens of rules and scripts to yaml files.

loading story #42469853

loading story #42468957

interludead1 week ago | parent | next

I think in some ways it could redefine how we think about voice control... taking it from the cloud and putting it back into users' hands, like literally

Animats1 week ago | parent | next

Nice. A totally local voice assistant.

This makes sense for cars, where there's much local stuff to control. But for a home unit, what do you want to do that is entirely local? Turning the heat up and down gets boring after a while. If it does entertainment selection or shopping, it needs outside world connections.

(Today's rant: I recently purchased a humidifier. It's just a little unit with a water tank, a water-softening filter, and an ultrasonic vaporizer. That part works fine. Then there are the controls.

All this thing really needs is an on-off switch and a humidity knob, and maybe lights for power, humidification, and water tank empty. But no. It has five touch buttons and a round display about four inches across. The display is on even if the unit is off. Pressing the on/off button turns it on. If it's humidifying, there's a whole light show. The tank lights up purple. Swooping arcs of blue run up both edges of the round display. It's very impressive, especially in a dark bedroom. If you press and hold the second button for two seconds, about half the light show is suppressed.

There are three fan speeds, and a button for that. Only the highest one will propel the water vapor high enough to avoid it hitting the floor and uselessly condensing before it mixes with the air. So that feature was not necessary.

The display shows one number. It's usually the current humidity, but if you press the humidity set button, the number displayed becomes the setting, which is changed upwards by successive presses until it wraps around. After a few seconds, the display reverts to current humidity.

Turning the unit off or removing the water tank resets all settings to the default.

This is the low-end unit. The next step up comes with an IR remote. It's one way - the remote has buttons but no display. Since you have to be close to the display to use the buttons effectively, that doesn't help much. The step up after that is, inevitably, a cloud-based phone app.

So this thing could potentially be interfaced to a voice assistant. That's only useful if there's enough information coming back from the device that the assistant software knows what the device is doing, and the assistant software understands that device status. If all it does is send remote button pushes, the result will be frustration.

So you need some degree of intelligence at both ends - the end that talks to the human, and the end that talks to the device. If the user says "House, it's too dry in here", the assistant system needs to be able to check the status of the humidifier. Has power? Talking? On? Humidity setting reasonable? Fan running? Tank not empty? If it can't do that, it's part of the problem, not part of the solution.)

loading story #42475361

loading story #42475517

sreejithr1 week ago | parent | next

Genuine question - How hackable is this? Can I have the voice commands redirected to my backend server where I can process it as I please?

loading story #42473075

loading story #42473102

loading story #42474236

afh11 week ago | parent | next

My experience with home assistance voice pipeline is nothing works and stt is terrible. I'll have to wait and see the reviews.

Simon_O_Rourke1 week ago | parent | next

A good emphasis in the summary, that certain other companies will only focus on monetization at the expense of features and functionality.

catmanjan1 week ago | parent | next

All I want is a voice assistant that I can call "computer" like Star Trek, I don't want to have to say a brand name thankyou!

loading story #42468404

loading story #42468328

1 week ago | parent | next

{"deleted":true,"id":42468361,"parent":42467194,"time":1734669982,"type":"comment"}

lxe1 week ago | parent | next

Here's what I'm looking for in a voice assistant:

- Full privacy: nothing goes to the "cloud"

- Non-shitty microphones and processing: i want to be able to be heard without having to yell, repeat, or correct

- No wake words: it should listen to everything, process it, and understand when it's being addressed. Since everything is private and local, this is now doable

- Conversational: it should understand when I finished talking, have ability to be interrupted, all with low latency

- Non-stupid: it's 2024, and alexa and siri and google are somehow absolutely abysmal at doing even the basics

- Complete: i don't want to use an app to get stuff configured. I want everything to be controlled via voice

loading story #42468394

loading story #42468471

loading story #42468967

loading story #42470013

loading story #42471806

ahaucnx1 week ago | parent | next

It's not clear to me from the description if this is also completely open source hardware. Are the schematics, BoM, firmware published under a permissible license? If so, where are they accessible?

And if not, I would be curious to know why it haven't been fully open sourced.

loading story #42469898

jve1 week ago | parent | next

While we are getting shoveled AI keyword everywhere, I'm actually disappointed I don't see it here.

The first thought I had when encountering LLM was that it can finally make these devices understand you and make them finally useful... and I don't need to know some presceipted keywords.

loading story #42468386

shepherdjerred1 week ago | parent | next

Home Assistant is such a fantastic project. I've been waiting for something like this for a long time; I just pre-ordered three.

My only remaining wish is that I can replace Siri with this (without needing some workaround)

1 week ago | parent | next

{"deleted":true,"id":42468283,"parent":42467194,"time":1734668905,"type":"comment"}

dboreham1 week ago | parent | next

It isn't even one year since the press stories about how dumb a product Alexa was and how it makes no money and all the devs are getting laid off. Something changed now?

loading story #42471840

loading story #42471822

loading story #42472101

loading story #42472166

loading story #42472240

loading story #42471843

nickthegreek1 week ago | parent | next

And on back order everywhere. I just spent the last 2 weeks getting a esp32-s3-box setup to do this but its lack of audio out really irks me.

loading story #42468583

loading story #42469077

loading story #42468794

leeoniya1 week ago | parent | next

anyone tried https://getleon.ai/ ?

loading story #42474083

delijati1 week ago | parent | next

Perfect will dig more into it. Currently i like to have an spotify client without ui for my kids ;)

cranberryturkey1 week ago | parent | next

https://linuxvoice.ai

lizzas1 week ago | parent | next

Open as in 3d print files, rpi etc.? If so this is the project I am looking for!

singularity20011 week ago | parent | next

sorry if this question takes away from the great strives the team went through but wouldn't it be much easier (hardware wise) to jailbreak one of the existing great hardware thingies like Apple HomePod or the Google one or Alexa?

loading story #42472531

loading story #42470000

loading story #42469835

loading story #42473533

gigel821 week ago | parent | next

What is a good GPU to put in a home server that can run the TTS / STT and the local LLM required to make this shine?

A 3090 is too expensive and power hungry. Maybe a 3060 12Gb? Is there anything in the "workstation" lineup that is more efficient especially since I don't need the video outs?

IG_Semmelweiss1 week ago | parent | next

Can someone describe the use case here? I don't quite understand what its purpose is.

Is this a fully-private, open source alternative to Alexa, that by definition requires a CPU locally to run ?

Is the device supposed to be the nerve center of IoT devices ?

Can it access the Wifi to do web crawls on command (music, google, etc)?

loading story #42468423

loading story #42468396

solarkraft1 week ago | parent | next

RIP Mycroft. A tad too early.

loading story #42469905

lostmsu1 week ago | parent | next

What voices do they use?

skyde1 week ago | parent | next

how does this compare to ESP32-S3-BOX-3B ?

unshavedyak1 week ago | parent | next

Well shoot. Now i want to record everything in my house and transcribe it for logs. I already wanted to do that but didn't think there was a sane way.. assuming this lets me create a custom pipeline, that's wicked

bsdice1 week ago | parent | next

Majel Barrett voice please.

starlite-50081 week ago | parent | next

[dead]

aaron6951 week ago | parent | next

[dead]

AIFounder1 week ago | parent | next

[dead]

albybisy1 week ago | parent

i don't wanna talk to a computer

loading story #42470648

#visit	11239718
#session	45025
#live-session	0