Multilingual Chatbot for Indian Languages

ASINGH · January 20, 2021, 7:07am

We are trying to create multilingual NLU for Indian Languages. Purpose of the NLU is to understand message from users for Booking LPG Cylinder or request for Mechanic visit. But twist is users can send message in any Indian language. We are preparing different intent for each language. Below giving one example of LPG Gas booking intents in different languages. Also I have added one more intent of out_of_scope where I am putting all out of scope messages in all language that could come from users. SO simply we would reject those messages.

NLU Content:

- intent: GASBOOKING_gu
  examples: |
    - હું ગેસ બુક કરવા માંગુ છું
    - મારે સિલિન્ડર બુક કરવું છે
    - મારે ગેસ સિલિન્ડર બુક કરવું છે
    - મારા માટે ગેસ સિલિન્ડર બુક કરો
    - કૃપા કરીને ગેસ સિલિન્ડર બુક કરો
    - કૃપા કરીને ગેસ બુક કરો
    - ગેસ સિલિન્ડર બુક કરો
    - શું તમે મારા માટે સિલિન્ડર બુક કરી શકો છો?
    - બુક ગેસ
- intent: GASBOOKING_mr
  examples: |
    - मला गॅस बुक करायचा आहे
    - मला सिलिंडर बुक करायचे आहे
    - मला गॅस सिलेंडर बुक करायचा आहे
    - माझ्यासाठी गॅस सिलिंडर बुक करा
    - कृपया माझ्यासाठी गॅस सिलिंडर बुक करा
    - कृपया गॅस बुक करा
    - कृपया गॅस सिलेंडर बुक करा
- intent: GASBOOKING_hi
  examples: |
    - मैं गैस बुक करना चाहता हूं
    - मैं सिलेंडर बुक करना चाहता हूं
    - मैं गैस सिलेंडर बुक करना चाहता हूं
    - मेरे लिए गैस सिलेंडर बुक करो
    - कृपया मेरे लिए एक गैस सिलेंडर बुक करें
    - कृपया गैस बुक करें
    - कृपया गैस सिलेंडर बुक करें
- intent: GASBOOKING_en
  examples: |
    - I want to book a gas
    - I want to book a cyl
    - I want to book a cylinder
    - I want to book a gas cylinder
    - Book  gas cylinder for me
    - Book  gas for me
    - Book a cylinder for me
    - Kindly Book a gas cylinder for me
- intent: GASBOOKING_kn
  examples: |
    - ನಾನು ಗ್ಯಾಸ್ ಬುಕ್ ಮಾಡಲು ಬಯಸುತ್ತೇನೆ
    - ನಾನು ಸಿಲಿಂಡರ್ ಬುಕ್ ಮಾಡಲು ಬಯಸುತ್ತೇನೆ
    - ನಾನು ಗ್ಯಾಸ್ ಸಿಲಿಂಡರ್ ಅನ್ನು ಬುಕ್ ಮಾಡಲು ಬಯಸುತ್ತೇನೆ
    - ನನಗೆ ಗ್ಯಾಸ್ ಸಿಲಿಂಡರ್ ಬುಕ್ ಮಾಡಿ
    - ನನಗೆ ಗ್ಯಾಸ್ ಬುಕ್ ಮಾಡಿ
    - ನನಗೆ ಸಿಲಿಂಡರ್ ಬುಕ್ ಮಾಡಿ
    - ದಯವಿಟ್ಟು ನನಗೆ ಗ್ಯಾಸ್ ಸಿಲಿಂಡರ್ ಬುಕ್ ಮಾಡಿ
    - ದಯವಿಟ್ಟು ಗ್ಯಾಸ್ ಬುಕ್ ಮಾಡಿ
    - ದಯವಿಟ್ಟು ಗ್ಯಾಸ್ ಸಿಲಿಂಡರ್ ಅನ್ನು ಬುಕ್ ಮಾಡಿ
 - intent: out_of_scope
   examples: |
     - How are you
     - what are you doing
     - i need help
     - i want gas papers
     - i need to book gas papers
     - i am looking for my gas papers
     - book my ticket
     - ਮੇਰੀ ਟਿਕਟ ਬੁੱਕ ਕਰੋ
     - ਤੁਸੀ ਕਿਵੇਂ ਹੋ
     - Tusi kivem ho
     - ਤੁਹਾਡਾ ਨਾਮ ਕੀ ਹੈ
     - உங்கள் பெயர் என்ன
     - നിന്റെ പേരെന്താണ്
     - તું શું કરે છે
     - તમારું નામ શું છે
     - મારી પ્રિય મૂવી ડાર્ક છે
     - ನನಗೆ ಕ್ರಿಕೆಟ್ ನೋಡಲು ಇಷ್ಟ
     - ನಿನ್ನ ಹೆಸರೇನು
     - ನನಗೆ ನಿನ್ನ ಸಹಾಯ ಬೇಕು
     - எனக்கு உங்கள் உதவி தேவை

Domain File NLU

intents:
  - GASBOOKING_en
  - GASBOOKING_hi
  - GASBOOKING_mr
  - GASBOOKING_gu
  - GASBOOKING_kn
  - out_of_scope

I am using below Pipeline as per RASA documents as you can see i am not using any pre trained model:

Config File

pipeline:

name: WhitespaceTokenizer

name: RegexFeaturizer

name: LexicalSyntacticFeaturizer

name: CountVectorsFeaturizer

name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4

name: DIETClassifier epochs: 100

name: EntitySynonymMapper

name: ResponseSelector epochs: 100

name: FallbackClassifier threshold: 0.7

Please advise whether it would be correct approach to deal with multilingual conversations. Also how effective out_of_scope intent would be in such cases where we need to give many more example for out_of_scope intent.

anishbapna · January 20, 2021, 1:40pm

Dear Ashutosh. We have a similar requirement in mind, but really not sure where to start on this. Looking forward for Rasa community support.

koaning · January 20, 2021, 1:41pm

Interesting!

My name is Vincent and I’m trying to add more support for Non-English languages in Rasa. There are a few things that jump to mind but I’ll gladly hear it if I am missing something.

We support a language agnostic variant of Bert. It’s a pretrained model from google and looking at the appendix in the original paper it is suggested that indeed English, Hindi, Marathi, Gujarati and Kurdish are supported. In order to use it you’ll want to configure a LanguageModelFeaturizer with the rasa/LaBSE weights. Note that LaBSE is an abbreviation for Language Agnostic BERT. A downside of this approach is that it is very “heavy”. There’s a lot of compute time involved.
I maintain a project over at rasa-nlu-examples which supports many pre-trained word vectors that might also help. The BytePair embeddings hosted there are available in 250+ languages and could offer a more light-weight method of adding context to your pre-trained pipeline.You can find more info in the docs.

For my understanding though. It seems like you’re interested in making a single assistant that can handle many languages. So I wonder, what responses do you send? What language? Is there a reason why you’re not considering making multiple assistants, one for each language?

ASINGH · January 20, 2021, 6:04pm

Hi Vincent,

You understood it correctly we are trying to make single assistant that can handle many languages(there are 22 major languages in India , written in 13 different scripts). Why we are considering it because our NLU is limited to very fix number of questions(LPG Booking and Mechanic visit, etc), so we are hoping that we can cover these questions in all languages with unique intent for each. Accordingly response will be given in the user’s language. Making multiple assistant for all languages is not the goal as you can see there would be many in that case.

I am not sure, do i need to give large number of examples in out_of_scope intent , because the problem I am facing currently is that, my trained NLU model is interpreting message, which should go into out_of_scope , as false positive (e.g .Let’s say user sent message 'I like watching movies' NLU interpretation is: GASBOOKING_en)… But once I define similar kind of examples in out_of_scope then it identifies correctly. I have taken example of English , but it is happing with all languages.

It’s working well actually except the false positive cases .

I would surely consider your suggestions.

koaning · January 21, 2021, 8:50am

The simple truth behind out of scope detection is that it is, certainly to my understanding, an unsolved problem. I’ve written down some technical details on why in this forum post but you might also appreciate this algorithm whiteboard video on fallback detection for some extra details.

One thing you might consider doing is to have multiple types of out_of_scope. If you have a look at our rasa-demo you’ll notice that you can pre-define many types of out-of-scope that should be detected. In your case, you might be able to have out-of-scope classes for each language.

This is a path that’s reasonable, but I wouldn’t spend too much time on it immediately. The fact that there are many out of scope situations imaginable doesn’t mean that they actually occur. It’s still best to look at examples from actual users as a source of inspiration for out-of-scope categories.

Out of curiosity, when you send the response to the user, how do you determine the correct language/text to send back? Is this handled by a custom action?

ASINGH · January 21, 2021, 11:48am

Ok , we are sending back two type of responses:

When Intent is GASBOOKING_LANGUAGE, so from the last two character of intent name, we know the language code. (e.g. GASBOOKING_hi ) In this case response should go in Hindi.
When intent is out_of_scope : in my case since there is only one out_of_scope category for all the languages. I am using polyglot library for language detection & then sending a appropriate message to user in detected language.

As you suggested making out_of_scope for each language, I guess that could be more effective.

koaning · January 21, 2021, 12:15pm

Are you using a custom action that’s using polyglot to handle the responses?

ASINGH · January 21, 2021, 12:22pm

Yes, using custom action which determines, language script.

example:

    from polyglot.detect import Detector
    msg_text='હું  બુક કરવા માંગુ છું '
    detector = Detector(msg_text)
    print(detector.language)

baval · August 5, 2021, 5:29am

hi @ASINGH i am also working on multilingual bot, just want to know is your code open source ? It would be great help if i can access it…

AnimeshPandey · December 17, 2021, 7:12am

Hi @baval and @ASINGH, want to know if you guys were successful in creating multilingual bot, it would be a huge help if I get to see it…

baval · December 17, 2021, 7:30am

Hi Animesh, yes sure

AnimeshPandey · December 20, 2021, 3:38am

Please tell me how we can connect @baval. Thank you!

AnimeshPandey · December 21, 2021, 6:09am

Hi Baval, may you please share your github link of the multilingual chatbot, which can help me

baval · December 21, 2021, 6:22am

i am looking into my schedule according to which we can meet

baval · December 21, 2021, 6:23am

Actually i can’t share the github right now, its client project but will make you familiar with the code

AnimeshPandey · December 28, 2021, 6:24am

Still hoping, you might help . It would be great if you do.

baval · December 28, 2021, 6:35am

sure sure sorry i completely forgot, can you pls ping me on linkedin

Prakashram27 · July 26, 2023, 7:22am

hey hi guys… It will be helpfull any reference for creating multilingual chatbot.

anoopshrma · July 26, 2023, 7:29am

One way I would take on multi-lingual bot is to create a custom component which takes the user input and use LLM like openAI or open-source and convert it into english. That way you won’t have to train the model on every language.

OpenAI is pretty good at detecting many languages. It can have some edge cases failure. But would work in most of the cases.

Second way would be to make it button based for major actions like options to choose which service to take.

Prakashram27 · July 26, 2023, 7:42am

Will you please provide source code reference for the first approach.

Topic		Replies	Views
RASA multilingual chatbot - only NLU or complete chatbot? Rasa Open Source	4	520	November 16, 2023
Hinglish (Hindi-English) chatbot Rasa Open Source	4	2416	December 13, 2021
Building a multi-lingual chatbot using Rasa and Chatfuel Tutorials, Resources & Videos	1	1482	January 10, 2019
Intentclassification very unreliable, what pipeline components should I use Rasa Open Source	1	338	May 3, 2022
Multilingual Bot Rasa Open Source	2	728	December 10, 2019

Multilingual Chatbot for Indian Languages

Related topics