Translation Layer

Currently Rasa PRO with CALM does not support translation out of the box. I think it would be a good idea to make it possible to add a translation layer to translate user messages to English, save the original language in metadata and translate bot’s message back to the original message using NLG maybe (prompt it so that it will rephrase + translate).

The contextual rephraser can do translation with the LLM if you customise the prompt: Contextual Response Rephraser

You can store the user’s language as a slot or you can tell the LLM to detect the user’s language and respond in the same language by adding to the rephraser prompt something like below:

Respond in the same language as USER’s Current input. Language translation is acceptable if needed.

But how to integrate translation of user’s message to English with the current architecture? Creating a channel for this disables a possibility to test a bot from shell/inspect. Creating a GraphComponent which will intercept the message and change its text is not changing it in tracker events, therefore “current_conversation” variable will still hold original values.

Why is it important to have the user’s messages to be all translated to English in the tracker? LLMs are able to understand a lot of languages so it should perform fine with a mix of languages in the current conversation.

Mainly, because you use embeddings for flow retrieval. And yes there are multilanguage embeddings but they perform worse for languages other than English. And for some not so popular languages they perform terrible.

Secondly, cheaper models (gpt-3.5) perform far worse when dealing with languages other than English. So translating user’s input to English with a model like gpt-4 is optimal option to keep the quality of bot.