I want to build an on premise Arabic AI chatbot with RASA. I understand RASA supports on premise deployments. But how about support for Arabic. Do I need to use any external translators?
You can train with your NLU data in arabic using the embedding classifier.
Keep in mind you will need a lot of examples
In my case user will type in Arabic script and bot needs to reply in Arabic unlike English to Bengali(Tumi kemn acho? — How are you?)
Hope following your approach with Rasa NLU Chatbot with spaCy and FastText should work?
Yeah it will still work with the embedding classifier even if your script is arabic
Here’s how the classifier actually works
Each word in your NLU training data is first tokenised ( now in arabic a whitespace tokeniser should actually work meaning like English - I am going to eat can be tokenised as [“I”, “am”, “going”,“to”,“eat”] using a whitespace in between. I am not sure if that is possible in Arabic or not. )
After tokenisation, keeping one word as an initial vector , it will create vectors for all the words in your training data including your intent name and using the vector for the intent, it will create a non-linear classifier. based on Facebook’s starspace algorithm. there is a paper on that
However for Out of vocabulary words, it won’t give a useful answer unless you have a lot of training data. This is one way to making a classifier completely language independent.
The second approach is using the vectors generated by FastText. You can convert the vectors from FastText to Spacy’s and use them to create a sklearn_classifier. However this won’t work for entity extraction as we don’t have any tagger or parser .
Check this documentation on different components
Aren’t those words just 0 then and if all are none the intent is
Could you use this fasttext then together with a custom NER_CRF?
I haven’t tested the language agnostic CRF and you are right for OOV, where the Intent should come as None