How can I implement text to speech and speech to text recognition in my rasa chatbot

I have made a chatbot. I want to add speech recognition ability to it (that of google). I have not been able to find any informative tutorial or a resource. Can somebody help me with this?

Thanks!

1 Like

There is a tutorial about using the STT and TTS capability of the browser to test this.

Pay attention that the code is different on mozilla and Chrome (Chrome uses speechchrome). So if you want to support multiple browsers, you need to detect the browser and have multiple versions of the code. If you need another source of voice than the browser of the user (or if you want absolutely google Speech-To-Text, even in Mozilla) then it’s up to you to build the chain : capture speech, send speech to google STT api, recuperate the text on your server, send it to RASA, receive the text answer, send the answer to Google Text-To-Speech and finally play it back to the user. Sounds complex, but all the components have a well documented REST API so it’s more like a piping exercise.

nb : by streaming the sound forth and back instead of just sending, you will have a better latency as the transcribe/playback will start immediately without waiting for the end of the message. Google API docs explain how to do this.

nb2 : don’t forget that if you use Google API for STT / TTS you lose (one of) the fantastic advantage of using RASA, being to keep your data confidential, on premise.

2 Likes