RASA multilingual chatbot - only NLU or complete chatbot?


When developing a chatbot meant to converse in a language other than English, is it desired to implement the NLU component (user intent identification) in English and the chatbot replies (the utterances defined in domain.yml) in the target language?

My reasoning would be that in English, one can easily make use of pretrained embeddings from transformer language models such as BERT or GPT in order to classify intents. However, in the target language, especially if it is a low resource language, the corresponding transformer models might cause the intent classification accuracy to go lower.

Ultimately the approach would be to have the input text in the low resource language, translate it to English, have the NLU component predict the intent (given the English text) and then reply accordingly. However, do the replies also have to be in English (and then translated back to the source language)? Is there any issue with coding the chatbot replies directly in the source language?

Thank you!

I recommend you create a single repo and produce a separate model for each language. Try sharing rules/stories but separate directories by language for the nlu/intents. This would require separate rasa train commands that pull the language specific nlu training data but use the common rules/stories.