How to configure Rasa Regex to prefer longer match?

Talos0248 · March 17, 2021, 8:58pm

Hi there, I’m trying out using lookup tables to detect cities. In my lookups table, I have both “batu” (from Indonesia) and “batu caves” (from Malaysia) as examples. I’m using RegexEntityExtractor in my pipeline

Once the model is trained, I typed in: “What’s the weather in Batu Caves?”

However, the model decided to extract the entity “batu” instead of “batu caves”.

Is there a way to customize the regex behaviour to prefer longer matches?

Talos0248 · March 18, 2021, 9:10am

Update - Here is what my training data looks like:

Inside my cities.yml:

The ‘batu’ section I was referring to:

Talos0248 · March 18, 2021, 12:11pm

Update: I SOLVED IT!

So what I did was sort my lookups in descending order such that the ones with the most number of characters were at the top of the list. Now regex featurizers automatically picks the ones with the longest characters!

It’s not your traditional workaround, but it somehow works. For someone curious to know how I managed to sort it in descending character length, here’s what I did:

I converted the lookup.yml file into a .csv file by changing the extension name
I opened up the .csv file in excel
I removed the version and nlu: lookup:city examples:| headers
I used this method to sort the characters by length, deleted the length column, then saved the file
I saved the csv file, then renamed it to .yml
I re-added the headers with notepad and saved using UTF-8 format (since notepad++ gave me issues for some reason)
Popped the file back into the data folder, trained my nlu model, and everything works like a charm!

Hope this managed to help someone out!

Topic		Replies	Views
Regex Rasa Open Source	5	792	March 5, 2020
RegexEntityExtractor not working in rasa==2.0.0rc4 Rasa Open Source	0	565	October 7, 2020
Help in using regex feature in rasa_nlu Rasa Open Source	10	3311	December 11, 2018
How can i use a lookup table por my entity? Rasa Open Source	4	839	October 10, 2022
Regex entity extractor generated a incomplete report Rasa Open Source	5	955	December 16, 2021

How to configure Rasa Regex to prefer longer match?

Related topics