Building a bot for local language

atwine · January 11, 2022, 4:55pm

Hey Nik, any ideas on how to integrate two languages in a bot, so that if someone types one language the bot is able to detect it and switch?

How about adding pipelines for local languages that are not supported in rasa?

nik202 · January 12, 2022, 12:52pm

I have personally not implemented such a use case lets us take help or suggestion from chris pinging @ChrisRahme for the help. Many thanks in advance.

ChrisRahme · January 12, 2022, 3:36pm

Please see the following posts/thread:

atwine · January 13, 2022, 6:08am

Thank you @ChrisRahme

Do you have any implementation examples, maybe a moodbot with the advise you gave so I can run an have some more context?

ChrisRahme · January 13, 2022, 6:11am

There’s my chatbot, but it doesn’t use a custom component.

Instead, it asks the user which language they want to talk in at the start of the conversation. The bot will always understand 5 “languages” mentioned in the NLU, but will only respond in the language the user selected.

atwine · January 13, 2022, 6:17am

@ChrisRahme Thank you very much, let me have a look at it and get back to you. Thank you.

atwine · January 20, 2022, 9:11am

Hey @ChrisRahme

I wanted to know more about the way you chose the pipeline, I actually thought I would need to build custom word embeddings for the language I want to use, or is it possible to work with the default pipeline because the alphabet is like english only missing a few letters.

souvikg10 · January 20, 2022, 1:03pm

Hello @atwine ,

before you go down the rabbit hole of building custom word embedddings? which language are you building the bot?

There are already a lot of pre-trained embeddings in low resource languages available from spaCy, FastText and some variants of berts too.

Also the default self supervised embeddings can work if you have decent amount of examples per intent(say about 15-20) as long as the language has words which can be split using WhitespaceTokenizer, see docs on how it splits the token.

atwine · January 20, 2022, 2:47pm

Hello @souvikg10

Thanks, the language am trying to build for is: Luganda (Ugandan local dialect.) Ideally my bot should work for English and Luganda. Luganda does have mostly the english alphabet characters and I think a white space tokenizer would do fine.

So you think I don’t have to try build custom embeddings?

souvikg10 · January 20, 2022, 3:22pm

You can try both

A. Try the Self supervised first. see if that fits your needs then you don’t need anything else

B. Enhance it with pre trained embeddings in luganda https://dl.fbaipublicfiles.com/fasttext/vectors-wiki/wiki.lg.vec (DON"T CLICK ON THIS UNLESS YOU WANT TO DOWNLOAD THE VECTORS) and you can follow this project on how to import these vectors into your rasa project - FastTextFeaturizer - Rasa NLU Examples

All fastText pretrained vectors are here

atwine · January 20, 2022, 3:27pm

@souvikg10

Thank you very much, this is a great place to start. I am beginning with the part A. I have built a minimal bot that is able to work in English and Luganda, let me share so you can have a look. covid.yml (714 Bytes) eng.yml (574 Bytes) nlu.yml (1.7 KB) rules.yml (413 Bytes) stories.yml (2.2 KB) config.yml (1.4 KB) domain.yml (2.9 KB)

This is the output:

souvikg10 · January 20, 2022, 7:15pm

looks like it is working. well done!! some years back i worked on the swahili language with the same pipeline and my experience is for most short task flows, it does work quite well.

atwine · January 20, 2022, 7:17pm

Thanks @souvikg10

I have a question, if i use spacy, (its the one I am using on my English bot with more than 100 intents), how will I combine it with this whitespace tokenizer thing? will i just add it in the pipeline just wondering

Does this pipeline make sense?

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
# No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# If you'd like to customize it, uncomment and adjust the pipeline.
# See https://rasa.com/docs/rasa/tuning-your-model for more information.
  - name: SpacyNLP
    model: en_core_web_md
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
    pooling: mean
  # - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
# No configuration for policies was provided. The following default policies were used to train your model.
# If you'd like to customize them, uncomment and adjust the policies.
# See https://rasa.com/docs/rasa/policies for more information.
  - name: MemoizationPolicy
  - name: RulePolicy
  - name: UnexpecTEDIntentPolicy
    max_history: 5
    epochs: 100
  - name: TEDPolicy
    max_history: 5
    epochs: 100
    constrain_similarities: true

souvikg10 · January 20, 2022, 11:52pm

you will need the same config for both language if you are to follow @ChrisRahme’s steps . Right chris?

ChrisRahme · January 21, 2022, 1:58pm

Nice job for your bot, Atwine. And thanks for the help, Souvig

My bot used a single pipeline for all languages, and all the NLUs were mixed together. Your bot is already more advanced since it can detect the language on its own Mine can’t do that, so I couldn’t even switch configs if I wanted to.

Pretty sure you can use Spacy with the Whitespace Tokenizer, but I think it would be better to put it before any Featurizers.

atwine · January 21, 2022, 3:38pm

Thanks team, let me take this direction for now, however I wonder if it will hold when the number of intents grow since now i will have to make two of each.

ChrisRahme · January 23, 2022, 8:33am

Ah so you went with making an intent per language.

This solution works smoothly but indeed the number of intents, stories, rules, and responses grows by N whenever you add a new language.

Prabakaran · September 23, 2024, 7:23am

@ChrisRahme , I’m trying to build a multi lingual Tourism bot for , initially i created it for english language…how to implement other languages…as u told, the no.of intents, stories, rules and responses are growing by N…how to sort that…also i want to give real-time information for tourism… tourism covers many places, so how to take the data(16 intents per location) for each and every location, because it will be a huge data… how to manage the response for intents

Topic		Replies	Views
Multilingual ChatBot Rasa Open Source	13	2980	March 25, 2023
RASA multilingual chatbot - only NLU or complete chatbot? Rasa Open Source	4	520	November 16, 2023
Building Chatbot into new language which is not listed Rasa Open Source conversation , community , rasa , nlu	3	2149	December 1, 2020
Building a multi-lingual chatbot using Rasa and Chatfuel Tutorials, Resources & Videos	1	1482	January 10, 2019
Build a french chatbot using rasa Getting Started with Rasa	9	258	June 23, 2020

Building a bot for local language

Related topics