Entity Recognition for (Non-English) Language

Hello,

I used Rasa for Arabic with the default configurations for the (intent, entity, and response models and pipeline).

When I tested “intent classification” it worked super good. But, on the other hand, the “entity recognition for Arabic” seems that it is not accurate.

The entity recognition models in the configuration are: DIETClassifier and EntitySynonymMapper.

As it has been mentioned in the document, that DIETClassifier (conditional random field on top of a transformer) is good for “custom” training.

So, I do not know which entity model is the most suitable one?

Maybe I need to use my own Arabic ER model? especially if I want the model to predict entities based on the context. I think there is no trained models in Rasa for Arabic, right?

Thanks,

Hello, I’m working on non-English language without any space to separate words as well. I found that the join intent classification and entity recognition performs worse than a normal CRF entity extractor. It’s probably because either I’m not using a proper word vectors for the model or it’s because no space. You can try with separate DIETClassifier for each entity extraction and intent classification like this:

...
- name: DIETClassifier
  intent_classification: False
  epochs: 50
  number_of_transformer_layers: 0
- name: DIETClassifier
  entity_recognition: False
  epochs: 50

or just like my config setup:

- name: CRFEntityExtractor
- name: DIETClassifier
  entity_recognition: False
  epochs: 50

Other things you need to check is the number of entity examples in your training data and lookup table might help as well.

Thanks a lot