Time to search a new solution for voice input

roberto · October 23, 2023, 7:02pm

The z2m lib to interface with Alexa node-red-contrib-alexa-remote2-applestrudel is going to be stucked due to the amazon decision to hide events.
I invested a lot (too much) to develop a holistic solution in NR in order to manage voice inputs based on that lib, so I need to look for other solutions.
What i need is a simple device able to listen for a voice input (triggered by a wake word) with a nodered flow able to catch it. It should be able to get inputs in italian. Nothing more nothing less.

Are there mature solutions able to perform such a task?
Are available stable NR libs to use google devices inputs?
Are there other contribs able to trap voice inputs events (I cannot immage such a cse, but who knows?)

nutcracker · October 24, 2023, 8:56pm

Any good?

roberto · October 24, 2023, 9:09pm

It Is exactly the device I would like to use
My only constraint Is I do not want use HA. I Need to check if It Is feasible.
Thanks @nutcracker

sburke781 · October 26, 2023, 11:38am

Do you have a partner willing to respond to commands

roberto · October 26, 2023, 11:57am

@sburke781 I Need It!
It Is a long way. The base config could be

raspi for wake Word processing and msg server to trigger NR nodes
multiple microphones : It Is still nucleare to me if they are ready out of the box to capture voice and stream It to raspi

sburke781 · October 26, 2023, 12:02pm

Unfortunately I can’t offer a technical solution…

roberto · October 26, 2023, 2:13pm

No problem. Moral support is enough.
I wrote a python trivial general service (to be hosted by the raspi server) where it is assumed a producer (microphone) send a json data structure (I don’t know what hell the microphone output is …) to the service. the service call a manager (wake word analysis) and then send the result (if a wake word is found) with the speak-to.text content.

I need now to understand at least :

what (data type) the microphone produce and how (rest, socket ?) it sends the stuff to the service
how the service can call the wake processor passing the received data
who, how, when and where the voice stream is reduced to a text

so, really a long journey

roberto · October 27, 2023, 2:48pm

Some collected news.
The Atom Echo is not feasible for wake word detection. Such a function requires much more power than an ESP32 cpu can provide. So You should be prepared to move this module to an external CPU with openwakeword or porcupine framework. Other stages is the speak to text . Again there are solution requiring quite a bit of.

I think this project is much over the effort I can/want to dedicate to.

Cjkeenan · October 27, 2023, 9:15pm

FYI there is a new update to applestrudel that seems to have fixed the issue for now. But I agree, I feel like we are on borrowed time at this point. I am currently playing around with a ESP32-S3-BOX and willow, but nothing to report right now.

roberto · October 27, 2023, 9:18pm

Thank you @Cjkeenan I lready installed It and It Is working… Until next API change.
I would like a local solution

Cjkeenan · November 3, 2023, 9:26pm

Take a look at willow hosted on a ESP32-S3-BOX-3. Not nearly as many features as alexa, but if all you need is speech to text, it should do the trick.

https://www.aliexpress.us/item/3256805733893224.html

roberto · November 3, 2023, 10:03pm

@Cjkeenan It looks really interesting.
I would like to know if I can install the WAS in the Oll hub.

Cjkeenan · November 3, 2023, 10:12pm

It has a docker image that I got working on Unraid without any fuss, so presumably yes, it could be integrated into the “User Integrations” of CC. Only issue is that Willow is still pretty new, so afaik, there is no way to integrate stuff like the GPIO or sensor boards included with the boxes.

We are currently considering the best way to go about user control of these GPIOs. Ideally we could use esphome or some other established/standard way to configure them but we haven’t completely thought this through yet. - Hardware - Willow

The ESP BOX provides 16 GPIOs to the user that are readily accessed from sockets on the rear of the device. We plan to make these configurable by the user to enable all kinds of interesting maker/DIY functions. - What Willow Is (And Isn’t) - Willow

roberto · November 3, 2023, 10:27pm

Did you test the voice recognition? The only supported lsanguage are currently chinese and english. To replace echos in my family It should recognize italian too.
May I get the recognized command outside the box ? How ?
Sorry for so many questions…

Cjkeenan · November 3, 2023, 10:32pm

That is probably a dealbreaker then, at least for now.

By default this is its intended purpose, its default programming is simply a speech to text engine.
You can send that text to a variety of sources like Home Assistant, OpenHAB, or just raw to some listener using their REST API:

I haven’t played too much around with it, but it seems to be very quick and decently accurate. It might struggle with heavy accents though. It is going to be a while until you get something that is as adaptable as Alexa or GH.

RRodman · November 5, 2023, 5:14am

Ok At this point Home Assistants progress in the Year of voice is progressing pretty well, and seems to have some real potential, even better it only costs about $13 to build the voice assistant hardware,

I am still catching up on the details, but it seems like they have made a ton of progress, including the ability to create and train your own CUSTOM wake words!!!

I have NOT made one or tested it out yet myself, its on my todo list (meaning it’ll likely be months before i even order one), so whoever beats me to testing it out please report back!

Details on making the voice assistant : $13 voice assistant for Home Assistant - Home Assistant

Latest Chapter on HA voice, which also links to the prior as well as details on wake words: Year of the Voice - Chapter 4: Wake words - Home Assistant

Direct link for the atom dev kit needed: ATOM Echo Smart Speaker Development Kit | m5stack-store

Cwwilson08 · November 5, 2023, 2:59pm

Ive been watching this. The process still seems complex for the average joe. Not anything i dont think i could muddle through, but seems like several moving parts have to come into alignment for it to work. Im looking forward to it becoming slightly more user friendly.

napalmcsr · November 5, 2023, 4:28pm

I am trying the $13 voice assistant, but as soon as I tried to setup the wyoming protocol and saw this, I bailed:

This will take someone more intimate with HA and core than I, or some instructions at roadblocks.

Craig

RRodman · November 6, 2023, 12:26am

whenever it has made it to the top of my to buy and then to do list I’ll see about doing a walkthrough covering any sticky points or things the official docs missed

roberto · November 6, 2023, 11:27am

I’ll presumably go with esp32 se and Willow for a try. I do not really want to add any other piece of cake to my system. Frankly I even expected to reduce them with OLL, but It was long long time ago.