Making a Voice Bot with Rasa, Text-to-Speech & ASR

Hello everyone, I’ve been working with rasa for a month now, I have implmented my data and trained the models and I’m satisfied with what I got. The next step in my project is to integrate rasa with Speech-To-Text and Text-to-Speech systems in order to make a fully working voice bot. Later on, I will use use the voice bot as an interviewer bot by integrating it on google meet.

What I have tried so far is to follow this guide. I had a lot of versioning issues setting up the environment and then I kept having errors and I figured that it’s not gonna work especially after I checked the GitHub issues section of rasa-voice-interface (lots of issues posted, none of them are resolved).

Right now I’m really lost on how am I gonna make this voice bot actually work and I would really appreciate some guidance if anyone here has worked on something similar.

Text-to-Speech tested & implemented: pytts3

Speech-to-Text (ASR) tested & implemented: Whisper

Here’s a showcase of something really similar for what I would like to achieve:

Do you want your bot to be available via the phone? If so, I would try the Twilio Voice connector.

Not really, I will link the bot with google meet later on.

As I mentioned, what I’m struggling with at this step is linking the Text-to-speech model (pytts3) and the Speech-to-Text tool (whisper) to rasa.

Take a look at this post

1 Like

how many languages does this TTS support? The current text to speech online AI that I’m using supports over 40 global languages so I needed a fair idea. Also, how fast and accurate are the results? (a rough time frame as compared to other similar software if mentioned will be okay, since I need it to process large amount of data)

I’m not sure exactly how many languages it supports but for the French language, which is the one I need, it was the best that’s why I picked it. I remember it having a lot of languages including English, Spanish, French. Results are very fast and very accurate compared to other tools I tried, I conducted a whole benchmarking with around 10 tools but as I said it was only for French models. It also offers different models for every language going from tiny ( decent accuracy, very fast) to large(really good accuracy, x5-10 slower than tiny)