How to build rasa chatbot in arabic?

Hi,

I want to build rasa chatbot in Arabic. what are the steps ? what’s config should I do exactly (in config.yml can I directy make language: ar) ?

Thank you.

Hi @mellahysf. Since SpaCy doesn’t have a language model for Arabic, you could try using fasttext embeddings instead. rasa-nlu-examples components allows you to very easily incorporate the embeddings into your pipeline configuration. Here is an example pipeline with the fasttext word embeddings for dutch language. Give it a go with the arabic language (you would have to download the embeddings for arabic lanaguage and specify that in the pipeline (cache_dir and file parameters). Would be great to know how that works for you.

@mellahysf I am the maintainer of that project. I’m currently experimenting with a tool that might offer a better tokenizer for Arabic. I’m not familiar with Arabic so any feedback that you might have can be very value-able. It’s an experimental feature though so feel free to AskMeAnything[tm] if you appreciate help. There’s a discussion here with an Estonian user of the feature, you should be able to follow along there by just switching a language setting.

Ok thank you @Juste for this reply. i will try :slight_smile:

Thank you very much @koaning.

1 Like

Also, as @Juste mentioned, the FastText embeddings in rasa-nlu-examples support Arabic but they tend to be very heavyweight (6-7GB). There’s also the more lightweight BytePair embeddings. There’s a small guide on the rasa nlu examples documentation page on how to use them. But you can also check the bpemb project website for more detailed information.

The reason for bringing it up: it seems like there’s also support for some Arabic variants.