Can't extract entities for Arabic language

it worked fine for intent classification but i couldn’t extract entities

this is my config file

language: “ar”

pipeline:

• name: “tokenizer_whitespace”

• name: “ner_crf”

• name: “intent_featurizer_count_vectors”

• name: “intent_classifier_tensorflow_embedding”

intent_tokenization_flag: true

intent_split_symbol: “_”

what should I do ?

did u use langdetect library? can you explain your problem in detail? I think i can help you me to face this kinda problem and solved it.:relaxed:

@rishier827 I do entity extraction with Sinhalese languages. I’m facing the same kind of problems that @ahlam1234 is facing. rasa correctly classifies intents, but many problems with unrecognized entities. I would like to know how you would solve these kinds of problems. Thank you in advance.

@BimsaraGamage using langdetect library i solved the problem which not detecting Ar but i m not sure langdetect supports sinhala

@BimsaraGamage seems like langdetect not supporting sinhala but,

You need to create a new language profile. The easiest way to do it is to use the langdetect.jar tool, which can generate language profiles from Wikipedia abstract database files or plain text.

Wikipedia abstract database files can be retrieved from “Wikipedia Downloads” (http://download.wikimedia.org/). They form ‘(language code)wiki-(version)-abstract.xml’ (e.g. ‘enwiki-20101004-abstract.xml’ ).

given langdetct link shows how to add language please try it or if you found library which supports sinhala please inform me

Thank you for the quick reply @rishier827 . I would certainly try your suggestion.