I would like to ask if there are arabic language components could be used for build the config file for intent classification\entity extraction\dates and time …etc
I have researched several times but don’t find the answer anywhere yet
I’m no expert in Arabic, but I have implemented some components that may help out. The main components that I’ve added are featurizers. These add dense pre-trained embeddings to your pipeline which should make it a lot easier for DIET to pick up entities and intents. You can find out more on the documentation on BytePairEmbeddings here. Our LanguageModelFeaturizer also integrates with Huggingface which offers some Non-English models too. In particular, we’ve heard good stories on LaBSE. If you appreciate an explainer on how LaBSE works, check the video here.
I just checked and it seems that my dateparser component also supports Arabic. It’s based on the dateparser library and their docs suggest they support many dialects of Arabic too. I cannot confirm the quality of the tool, but if you’ve gotten any feedback I’d be all ears.
Oh! And there’s one more thing. The reasoning is explained in more detail in this video but we’ve also gotten namelists available if you’re interested in detecting names as entities. The rasa-nlu-examples repository therefore also contains namelists from around the world. It includes a set of Arabic names. These names can be used in our RegexEntityExtractor or in the FlashTextEntityExtractor.