Need clarity RASA Regex

surya7592 · August 20, 2019, 1:05pm

Hi,

I am working on a chat bot which requires the user to enter a lot of unique id. For example account number or complaint number similar to xx0934345, xa237472834, xb3453

To extract these id I made use of regular expressions and it works too but what i really don’t understand is regarding examples. According to some forum post the regex needs some examples to work but that won’t works as expected. Example I have given few examples like

my account number is xa4538475
xa223423904 is my account number
compliant id xc23423
compaint request id is xc3242384

Model based on these example will only extract patterns like xa23423 or xc678678 but examples like xk324234 or xu234234 or xo234234 will not work since there are no similar example in the NLU data. As per my regex it should identify and extract all patterns which starts with two alphabet followed by numbers.

See if there is any way to handle it @akelad @dakshvar22 @Juste

Tanja · August 22, 2019, 7:45am

Whenever you have the RegexFeaturizer in your NLU pipeline, Rasa is looking for matching candidates of your defined regular expressions in the text. However, Rasa will not just extract them as entities, but will create features, e.g. word matches regex for account or not. Those features will be added to the features used for the CRF (model to extract entities). Thus, you need to add some examples to the NLU data, so that the CRF can learn that those features are relevant to determine whether a word is an entity or not.

So, maybe double check if the RegexFeaturizer is in your pipeline and try to add some more examples to the NLU data.

surya7592 · September 2, 2019, 9:05am

Hi,

This just goes against the whole idea of using regular expression.

Have included the RegexFeaturizer in the pipeline. It is capturing regex also but only capturing regex similar to the examples specified in the NLU training data. Adding more examples to training data helps but it will make the training data huge. And in our case we cannot include all possible combination of regex in the data. Because it may vary from time to time.

For example, Now it may be ab000123, ac234145, ad3456789 but later it may change it to zw4567257, zt7531598 etc.

How to take care of such cases.

Tanja · September 9, 2019, 10:54am

@surya7592 Maybe, we can add a flag to the RegexFeaturizer that indicates whether to add the regex as feature or to take the matches directly as entities. What do you think? Can you open an issue for that on GitHub?

Topic		Replies	Views
The difficulties to use regular extraction and rasa should improve it Rasa Open Source	1	442	December 3, 2021
Has anyone successfully implemented strict regex patterns for entity extraction? Rasa Open Source	1	252	July 3, 2023
RegexEntityExtraction doesn't work on rasa 3.6.15 Rasa Open Source	0	145	February 6, 2024
Regex Name Extraction from custom Action Rasa Open Source	7	1706	June 26, 2019
How to make rasa recognize numbers like ID's? [Deprecated] Rasa X Community Edition	3	1016	September 29, 2020

Need clarity RASA Regex

Related topics