Email classification

Hajer · October 15, 2020, 9:37am

Hi ,I am using RASA NLU in my work recently. My subject is to classifiy email on two intents and a lots of entities like name of person, account number, membership number, phone number. I have tried a lot of configuration of pipeline but i couldn’t find the pipeline who succeed to extract these entities… In one email, there is a lot of entities to extract but the result is not concluant…

Is there anyone who do the same subject and can give me some tips? Is There an order who is using by NLU to exctract entities because sometimes when i move a word before an other the NLU recognise it and sometimes not…

Thank you for helping me

Tanja · October 20, 2020, 7:42am

What kind of pipeline are you currently using? Also can you maybe share some examples of your training data? That would make it easier to understand what kind of entities you are trying to extract. Also what language are you using?

In general we made quite good experience with the DIETClassifier. Often is already sufficient if you have a pipeline similar to

- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: EntitySynonymMapper

Hajer · October 20, 2020, 8:46am

Hello, i am using french language with this pipepline

language: “fr” pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: “CRFEntityExtractor”
name: LexicalSyntacticFeaturizer
name: EntitySynonymMapper
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100 batch_size: 64 entity_recognition: False

I want to recognize Name of person, unmber account, membership number … but sometimes the NLU recognize some of these and sometimes no. It’s very random result… I have implemented some regex.

Thank you

Tanja · October 20, 2020, 9:15am

Can you show some of the mistakes your model makes?

Hajer · October 20, 2020, 12:40pm

For example : ##################################################################### Test n° 1 en 650 ms pour “Bonjour, je suis Madame Ludivine, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12”:

Intentions trouvées :

modif_RIB (confiance : 0.9982733726501465)
autre (confiance : 0.0017266019713133574)

Entités trouvées :

civilite : “madame”**

Hajer:

For example : Test n° 1 en 650 ms pour “Bonjour, je suis Madame Ludivine, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12”:

Intentions trouvées :

modif_RIB (confiance : 0.9982733726501465)

autre (confiance : 0.0017266019713133574)

Entités trouvées :

civilite : “madame”

numero_adherent : “1\1256325”

Test n° 2 en 1111 ms pour “Bonjour, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12 Madame Ludivine”:

Intentions trouvées :

modif_RIB (confiance : 0.9998962879180908)

autre (confiance : 1.0371021926403046E-4)

Entités trouvées :

numero_adherent : “1\1256325”

civilite : “Madame Ludivine”

Civilite is Madame or Monsieur But the NLU don’t reconize the name of Ludivine and the phone number and sometimes he recognize one of them and sometimes no. Perhaps the orders matters ? I don’t know if you can give me some tips to enhance the entity extraction, it will be great

Hajer:

For example : Test n° 1 en 650 ms pour “Bonjour, je suis Madame Ludivine, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12”:

Intentions trouvées :

modif_RIB (confiance : 0.9982733726501465)

autre (confiance : 0.0017266019713133574)

Entités trouvées :

civilite : “madame”

numero_adherent : “1\1256325”

Test n° 2 en 1111 ms pour “Bonjour, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12 Madame Ludivine”:

Intentions trouvées :

modif_RIB (confiance : 0.9998962879180908)

autre (confiance : 1.0371021926403046E-4)

Entités trouvées :

numero_adherent : “1\1256325”

civilite : “Madame Ludivine”

Civilite is Madame or Monsieur But the NLU don’t reconize the name of Ludivine and the phone number and sometimes he recognize one of them and sometimes no. Perhaps the orders matters ? I don’t know if you can give me some tips to enhance the entity extraction, it will be great

numero_adherent : “1\1256325”**

Test n° 2 en 1111 ms pour “Bonjour, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12 Madame Ludivine”:

Intentions trouvées :

modif_RIB (confiance : 0.9998962879180908)
autre (confiance : 1.0371021926403046E-4)

Entités trouvées :

numero_adherent : “1\1256325”
civilite : “Madame Ludivine”

An other examples: Test n° 9 en 1021 ms pour “Nouvel identité bancaire, numéro : 1/125468954”:

Intentions trouvées :

modif_RIB (confiance : 1)
autre (confiance : 1.0737703044425007E-12)

Entités trouvées :

numero_adherent : “1/125468954”

Test n° 10 en 694 ms pour " numéro : 1/125468954 Nouvel identité bancaire":

Intentions trouvées :

modif_RIB (confiance : 1)
autre (confiance : 9.96091447172387E-13)

Entités trouvées :

#################################################### Civilite is Madame or Monsieur But the NLU don’t reconize the name of Ludivine and the phone number and sometimes he recognize one of them and sometimes no. Perhaps the orders matters ? I don’t know if you can give me some tips to enhance the entity extraction, it will be great.

Hajer · October 20, 2020, 12:46pm

It’s a very simple example, but in a lot of case, the e-mail is much longer than this one

Tanja · October 21, 2020, 12:28pm

How much training data do you have? E.g. how much examples per entity? It often helps just to add a couple of more examples to the training data.

Also if you want to extract number of a certain pattern, I recommend to either use duckling or RegexEntityExtractor. That should help you to extract entities that follow a certain pattern.

Also it might be a good idea to switch to DIETClassifier for entity extraction instead of CRFEntityExtractor as it is usually a bit more powerful.

So maybe you can try the following config and use the DIETClassifier to extract civilite and the RegexEntityExtractor to extract the phone number, for example.

- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: RegexEntityExtractor
- name: EntitySynonymMapper

Hajer · October 23, 2020, 11:24am

Thank you, i am working on it. It 's already better. I am using this one now : language: “fr” pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: RegexEntityExtractor use_regexes: True
name: “CRFEntityExtractor”
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100
name: EntitySynonymMapper

I have a question: the order of compoment like regexEntityExtractor before DietClassifier matter or not ? Is it better to put regexEntityExtractor after DietClassifer ?

Topic		Replies	Views
Entity email extractor Rasa Open Source	2	2662	November 2, 2018
Rasa NLU Supervised Embeddings Pipeline entity issue Rasa Open Source	2	1570	February 5, 2020
Suggestion for pipeline Rasa Open Source	1	521	April 9, 2019
Entity recognition problem Rasa Open Source	3	1575	October 31, 2018
Problem with using two different entity extractors Rasa Open Source	3	418	September 24, 2020

Email classification

Related Topics