NLU customization for Arabic language

Hi everyone,

I am working on NLU customized pipeline for Arabic language. I am trying to test different components and test the performance based on them.

Do you have any advice to be sure that I am working on the right direction all the way? Any suggestions for projects that have used transformers( like bert) on the pipeline for non-English language?

Any other suggestion will be helpful for me.

@koaning

SpacyNLP doesn’t seem to have an arabic language model yet, but you can try fastText vectors arabic language models as explained in rasa language support documentation.

1 Like

Hi all.

Just to check @Pain have you seen my blogpost on Non-English tools in Rasa?

2 Likes

Yes, I have seen it before. It is really helpful.

But, I am looking for suggestions for pipelines that used Arabic language to see the components on it. So, I will have the full understanding where to put the transformer model in the pipeline and I will know what to use with it from available components.

Technically, this contains the LaBSE model with is multi-lingual and should contain some notion of Arabic.

pipeline:
- name: WhitespaceTokenizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: LanguageModelFeaturizer
  model_name: bert
  model_weights: rasa/LaBSE
- name: DIETClassifier
epochs: 100

Is that what you mean?

Yes, I mean like this. I have tried the following combination, but I think there is an error when using asafaya/bert-base-arabic.

version: "2.0"
language: ar
pipeline:
  - name: WhitespaceTokenizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "rasa/LaBSE"
  - name: DIETClassifier
    epochs: 200
  - name: FallbackClassifier
    threshold: 0.7
  - name: DucklingEntityExtractor
    url: http://localhost:8000
    dimensions:
    - time
    - number
  - name: EntitySynonymMapper

I put the problem on another post, I did that before your reply and maybe it is a different question. In case you want to check, here it is: Problem when using transformer in NLU pipeline

Hi Pain Can you please support some help and tell us what Arabic pipeline did you find the best to use finally thanks

Hello @Pain, I sow the comments, how things going till now. I faced the same issue.

1 Like

Hey Abdo, have you made it through your issue? I’m now developing an Arabic chatbot and I guess I need your advice about that I really appreciate any help you can provide.