Hi ,I am using RASA NLU in my work recently. My subject is to classifiy email on two intents and a lots of entities like name of person, account number, membership number, phone number.
I have tried a lot of configuration of pipeline but i couldn’t find the pipeline who succeed to extract these entities… In one email, there is a lot of entities to extract but the result is not concluant…
Is there anyone who do the same subject and can give me some tips?
Is There an order who is using by NLU to exctract entities because sometimes when i move a word before an other the NLU recognise it and sometimes not…
What kind of pipeline are you currently using? Also can you maybe share some examples of your training data? That would make it easier to understand what kind of entities you are trying to extract. Also what language are you using?
In general we made quite good experience with the DIETClassifier. Often is already sufficient if you have a pipeline similar to
I want to recognize Name of person, unmber account, membership number … but sometimes the NLU recognize some of these and sometimes no. It’s very random result… I have implemented some regex.
For example :
#####################################################################
Test n° 1 en 650 ms pour “Bonjour, je suis Madame Ludivine, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12”:
Intentions trouvées :
modif_RIB (confiance : 0.9982733726501465)
autre (confiance : 0.0017266019713133574)
Entités trouvées :
civilite : “madame”**
numero_adherent : “1\1256325”**
Test n° 2 en 1111 ms pour “Bonjour, numéro d’adhérent : 1\1256325, je vous met ci-joint mon rib vous pouvez-joindre au : 06 32 14 89 12 Madame Ludivine”:
Intentions trouvées :
modif_RIB (confiance : 0.9998962879180908)
autre (confiance : 1.0371021926403046E-4)
Entités trouvées :
numero_adherent : “1\1256325”
civilite : “Madame Ludivine”
An other examples:
Test n° 9 en 1021 ms pour “Nouvel identité bancaire, numéro : 1/125468954”:
Intentions trouvées :
modif_RIB (confiance : 1)
autre (confiance : 1.0737703044425007E-12)
Entités trouvées :
numero_adherent : “1/125468954”
Test n° 10 en 694 ms pour " numéro : 1/125468954 Nouvel identité bancaire":
Intentions trouvées :
modif_RIB (confiance : 1)
autre (confiance : 9.96091447172387E-13)
Entités trouvées :
####################################################
Civilite is Madame or Monsieur
But the NLU don’t reconize the name of Ludivine and the phone number and sometimes he recognize one of them and sometimes no.
Perhaps the orders matters ? I don’t know if you can give me some tips to enhance the entity extraction, it will be great.
How much training data do you have? E.g. how much examples per entity? It often helps just to add a couple of more examples to the training data.
Also if you want to extract number of a certain pattern, I recommend to either use duckling or RegexEntityExtractor. That should help you to extract entities that follow a certain pattern.
Also it might be a good idea to switch to DIETClassifier for entity extraction instead of CRFEntityExtractor as it is usually a bit more powerful.
So maybe you can try the following config and use the DIETClassifier to extract civilite and the RegexEntityExtractor to extract the phone number, for example.
I have a question: the order of compoment like regexEntityExtractor before DietClassifier matter or not ? Is it better to put regexEntityExtractor after DietClassifer ?