Hello, I have installed rasa x using docker compose on a azure instance and making changes over GUI like (Intents and all.)Now I want to create a chatbot in “Arabic” language instead of “English” .So I downloaded the BYTEPAIR_EMBEDDINGS using commands:
pip3 install flair
pip3 install bpemb
from flair.embeddings import BytePairEmbeddings
embedding = BytePairEmbeddings(‘multi’).
Output:
Setting dim=300 for multilingual BPEmb
downloading https://nlp.h-its.org/bpemb/multi/multi.wiki.bpe.vs100000.model
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1965223/1965223 [00:01<00:00, 1916803.68B/s]
downloading https://nlp.h-its.org/bpemb/multi/multi.wiki.bpe.vs100000.d300.w2v.bin.tar.gz
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 112202964/112202964 [00:08<00:00, 13721233.23B/s]
What to do next to create a chatbot for “Arabic”.
You can make a chatbot in Arabic the same way as English, with no further steps needed.
I made a chatbot that speaks English, French, Arabic, Lebanese, and Armenian, without requiring additional steps.
You mean like i have to just mention the pipeline like this.
language: ar
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: EmbeddingIntentClassifier
intent_tokenization_flag: true
intent_split_symbol: “+”
- name: CountVectorsFeaturizer
analyzer: “char_wb”
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
policies:
- name: MemoizationPolicy
- name: RulePolicy
- name: TEDPolicy
max_history: 5
epochs: 100
Yup, it even works without mentioning ar
.
PS: You can use three backticks (```) a line before and after code to properly format it
```
Like this
```
What to mention in language?
How it recognize tokenizer , featurizer etc. Like Arabic is different from English.
What if some one doesn’t know anything about English and find it difficult to understand the English.
You can mention language: ar
, I simply said it is not a requirement for it to work.
Like I said previously, my bot speaks English, French, Arabic, Lebanese, and Armenian, and in my pipeline I mentioned language: en
.
Rasa is language-independent. In the end, it all translates down to 1s and 0s.
Of course, it’s more complicated than that. Watch the Rasa Algorithm Whiteboard playlist to learn more about how stuff works theoretically.
I’m not sure what you’re trying to say here. Your bot is capable to understand and talk in Arabic. No English required.
Thank you for giving me the reply. I hope it will work for me.
Good luck!
You can write your data in Arabic with no problem:
- intent: greet
examples: |
- مرحبا
- أهلا بك
- صباح الخير
- مساء الخير
- اهلا
- بونسوار
You can also add Arabish:
- intent: greet
examples: |
- kifak
- ahla
- ahla w sahla
- ahlen
- mar7aba
- مرحبا
- أهلا بك
- صباح الخير
- مساء الخير
- اهلا
- بونسوار
And even mix up languages like here: fyp-chatbot/chitchat.yml at main · ChrisRahme/fyp-chatbot · GitHub
If it helped you, please mark my answer as solution for future readers