I do not manage to find any relevant information about this subject, what is the best practice to include a text-to-speech engine inside rasa core ? I got my own TTS engine, I just want to switch between textual feedback to a synthesize one.

  • Should I implement this feature inside my action.py ?

  • Should I write my own dispatcher (via OutputChannel)?

Running into the same issue. Please let me know if you find any answer. Thanks.

Sure, I’m currently back engineering the source code! I’ll let you know

Thanks for quick reply. Which technology are you using for TTS? Are there any open source tools for Text-To-Speech and Speech-To-Text? Thanks.

There are several and they are getting better, but they don’t come close to Google quality yet.


  • For the ASR (speech-to-text) : Last summer, my job was to build an embedded speech recognizer that could run on a Jetson TX2 with realtime capability. My work was inspired by a mix of DeepSpeech (Mozilla) and Wave2letter (Facebook) architecture. If you are new to the subject, I suggest to use Keras as it has a “friendly” programming interface. There are several implementations of such architectures if you googleit!
  • For the TTS (text-to-speech) : I’m currently using a solution from a sister company called Acapela. If I had to build my own, the best implementation is called “Tacotron 2” from google. I think there are some implementation on github.



Thank you Asokolow. I’m pretty much new to ASR & TTS technologies. Will look into Keras & Tacotron 2. My requirment is to build a ChatBOT with speech cabalities which would need to be integrated into Angular 6 web application.

If your targeted application is web-based, I suggest to have a look at TensorflowJS instead of Keras! Tensorflow is a bit more complex than Keras but it is worth every penny (a.k.a time)!

