Rasa x Arabic language

Hello, I have installed rasa x using docker compose on a azure instance and making changes over GUI like (Intents and all.)Now I want to create a chatbot in “Arabic” language instead of “English” .So I downloaded the BYTEPAIR_EMBEDDINGS using commands:

pip3 install flair pip3 install bpemb

from flair.embeddings import BytePairEmbeddings embedding = BytePairEmbeddings(‘multi’).

Output:

Setting dim=300 for multilingual BPEmb downloading https://nlp.h-its.org/bpemb/multi/multi.wiki.bpe.vs100000.model 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1965223/1965223 [00:01<00:00, 1916803.68B/s] downloading https://nlp.h-its.org/bpemb/multi/multi.wiki.bpe.vs100000.d300.w2v.bin.tar.gz 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112202964/112202964 [00:08<00:00, 13721233.23B/s]

What to do next to create a chatbot for “Arabic”.

You can make a chatbot in Arabic the same way as English, with no further steps needed.

I made a chatbot that speaks English, French, Arabic, Lebanese, and Armenian, without requiring additional steps.

You mean like i have to just mention the pipeline like this.

language: ar pipeline:

  • name: WhitespaceTokenizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: EmbeddingIntentClassifier intent_tokenization_flag: true intent_split_symbol: “+”
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100

policies:

  • name: MemoizationPolicy
  • name: RulePolicy
  • name: TEDPolicy max_history: 5 epochs: 100

Yup, it even works without mentioning ar.


PS: You can use three backticks (```) a line before and after code to properly format it

```
Like this
```

What to mention in language? How it recognize tokenizer , featurizer etc. Like Arabic is different from English. What if some one doesn’t know anything about English and find it difficult to understand the English.

You can mention language: ar, I simply said it is not a requirement for it to work.

Like I said previously, my bot speaks English, French, Arabic, Lebanese, and Armenian, and in my pipeline I mentioned language: en.

Rasa is language-independent. In the end, it all translates down to 1s and 0s.

Of course, it’s more complicated than that. Watch the Rasa Algorithm Whiteboard playlist to learn more about how stuff works theoretically.

I’m not sure what you’re trying to say here. Your bot is capable to understand and talk in Arabic. No English required.

Thank you for giving me the reply. I hope it will work for me.

Good luck!

You can write your data in Arabic with no problem:

- intent: greet
  examples: |
    - مرحبا
    - أهلا بك
    - صباح الخير
    - مساء الخير
    - اهلا
    - بونسوار

You can also add Arabish:

- intent: greet
  examples: |
    - kifak
    - ahla
    - ahla w sahla
    - ahlen
    - mar7aba
    - مرحبا
    - أهلا بك
    - صباح الخير
    - مساء الخير
    - اهلا
    - بونسوار

And even mix up languages like here: fyp-chatbot/chitchat.yml at main · ChrisRahme/fyp-chatbot · GitHub

If it helped you, please mark my answer as solution for future readers :slight_smile: