The difficulties to use regular extraction and rasa should improve it

The rasa doc teach us how to use regular expressions for entity extraction. But when I tried it I found out that it has many limitations. It’s hard to use. And I think it has bug. Here I show some example of it.

The first case, I want to design an intent in nlu.yml which has two different entities but the entities has same regex. The nlu example is: “I have made an phone call with my telephone number 1234567 to call my friend with his phone number 7654321”. The out_number and in_number are both telephone numbers and set the regex to be \d{7}. I think it’s a popular application scenario: the entities are different type, but they looks same. But rasa could do extraction successfully. Rasa reported error during dialogue. Because when it catch any one of phone number in the sentence, it will bu puzzled by which regex to use. In fact I want to train rasa to learn that the first matched regex is the out caller and the second regex is the in callee. I think it’s rasa’s defect.

In second case, I design a intent to place two entities inside: one is \d{7} telephone number and another one is \d{4} id number. When extract id number both the id number and phone number will be extracted because both regex match the id string ‘1234’. If the phone number is the latter one in the sentence, rasa will extract part of phone number (the first 4 numbers) and replace the correctly extracted id number. I think the mechanism is wrong. Rasa should extract the most likely words only if there are many matchs. Further more, I think rasa should extract the longer regex first and should not extract from these long words again when extract short regex later(One word only once). I checked the rasa souce code : regex_entity_extractor.py and change it to match my requirement and it was success. I think rasa should improve regex extraction in these complex situations. What’s your opinion?

The roles and groups features are designed to help with this. You can read about it in this blog post and in the docs here.

1 Like