Pattern for lookup tables

lgabs · June 11, 2020, 12:41am

I’m using lookup tables, but I’m been a bit confused about two related things:

where in the logs can I check whether my lookup tables were correctly mapped to desired entity? (assuming no directory erros)
what is the(s) pattern(s) to follow on mentioning them?

In the rasa-demo project, we have the following example, in despite of having the entities location and product (not products) defined on domain:

## regex:greet
- hey[^\s]*

## regex:zipcode
- [0-9]{5}

## lookup:location.txt
data/nlu/lookups/location.txt

## lookup:products.txt
data/nlu/lookups/products.txt

In the docs, we have this:

## lookup:plates
data/test/lookup_tables/plates.txt

## intent:food_request
- I'd like beef [tacos](plates) and a [burrito](plates)
- How about some [mapo tofu](plates)

Futhermore, with rasa=1.10.0 and rasax=0.28.5, I’m experiencing that rasa x fixes the format to mach the case of rasa-demo project.

Tanja · June 12, 2020, 9:23am

Hi @lgabs, if you have the RegexFeaturizer in your pipeline and the assistant is training without any errors, the lookup tables are used. We don’t log if they are successfully used, we only log is something went wrong.

Regarding the usage: As pointed out in our documentation you need to include some examples in your training data that use an entry from the lookup table. The entries in the lookup tables are converted into features. Let’s take a look at an example:

“Hello my name is Tanja.”

Assume you have a lookup table for names and that one includes the word “Tanja”. What the RegexFeaturizer is then doing, it looks if it can find an entry from the lookup table in the text. If it finds a match, it sets a feature that basically says that “Tanja” is listed in a lookup table. The features are then passed on to the model, for example, DIETClassifier, and the model is then hopefully learning the correlation between “Tanja” is listed in the lookup table and “Tanja” being an entity of type name. If you don’t include any examples in the training data that contain an entry of the lookup table, the model is not able to learn this correlation as it will not be present. So it is important that you include some entires from your lookup table in the training data and mark them as entity.

lgabs · June 12, 2020, 1:52pm

Thanks for the reply and explanation! I’m including some examples in trainning data indeed, and I’m getting good results so far, except for names, which is very hard I was afraid of using the wrong pattern on mentioning them as above, and specially because rasa X reformats the pattern to match ##lookup:<entity_name>.txt or csv, depending on the file, and on rasa demo we have a product entity but in the nlu lookup mention is said products.txt.

Tanja · June 12, 2020, 2:29pm

Could be that our examples are not up-to-date. Thanks for pointing that out, will have a look.

PaulB · June 16, 2020, 3:49pm

Hello,

It looks like I have the same problem.

Using the RegexFeaturizer and Diet gives really bad results for my Entities in lookup tables. However, I have a lot of training sentences containing examples that are in the lookup tables (more than 200) the entities that are not specified in training sentences are mostly not recognized.

I tried to open a topic about that (here) but never had a relevant answer.

I can share with you 200 tests comparing RegexFeaturizer + CRFEntityExtractor with RegexFeaturizer + DIETClassifier

Tanja · June 17, 2020, 12:34pm

Hi @PaulB!

I just read through the thread you shared. Just wanted to make clear that DIETClassifier and CRFEntityExtractor do not influence each other. So if CRFEntityExtracotr comes first in the pipeline, the entities detected by that component are not used inside DIETClassifier.

As mentioned above the RegexFeaturizer converts the lookup tables into features that are then used by the components later in the pipeline. So if you are using the DIETClassifier without entity recognition, but you have a RegexFeaturizer in the pipeline and defined some lookup tables, those features will also be used for the intent classification in the DIETClassifier. However, in our experience those features are normally not that important for intent classification.

Regarding your problem around bad performance for the DIETClassifier on entities: It is hard to tell what might be going wrong without looking at the data and some evaluation metrics. How many entities do you have? What entities do you have? How many examples do you have per entity in your training data? Is it only the DIETClassifier that performs bad or also the CRFEntityExtractor?

PaulB · June 25, 2020, 9:50am

Hi Tanja,

Thank you for your response.

How many entities do you have?

I have 3 entities ( GivenName, Lastname, Service) that are linked to lookuptables with in average 150 examples in it

How many examples do you have per entity in your training data?

In my data.json :

GivenName :50
LastName : 150
Service : 200

Is it only the DIETClassifier that performs bad or also the CRFEntityExtractor ?

CRFEntityExtractor performs well. DIETClassifier works well with intent detection, but is terrible with entity extraction (particularly the ones that are in the lookup and not in the data.json)

ishamandrekar · August 9, 2020, 2:45pm

when i train rasa nlu, i am getting c:\users\isha\anaconda3\envs\installingrasa\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py:79: FutureWarning: Directly including lookup tables as a list is deprecated since Rasa 1.6. regex_pattern = self._generate_lookup_regex(table)

I have a lookup table which i have mentioned this way in nlu.md

lookup:food

data/food_list.txt

Some examples from it i have included in the training.

Also CRFEntityExtractor and Dietclassifier both are there but entity_recognition: False for dietclassifier. So want CRFEntityExtractor to refer the lookup list. i also have the required RegexFeaturizer in my config.

Still lookup table is not being referred. examples which are present in lookup table but not in training are not being extracted.

ishamandrekar · August 9, 2020, 2:57pm

My issue i was able to identify and resolve. May be this might help others who are facing difficulty with lookups. in nlu we shuldnt put a hyphen in front of the file path for lookup. regexfeaturizer then identifies that as a list. only mention the path without hypen like so:

lookup:food

data/food_list.txt

Topic		Replies	Views
FutureWarning: Directly including lookup tables as a list is deprecated since Rasa 1.6. regex_pattern = self._generate_lookup_regex(table) Rasa Open Source	6	615	January 14, 2021
Lookup Table not working for DIET Classifier + RegexFeaturizer Rasa Open Source	10	2126	June 29, 2021
Lookup not working in entity extraction Rasa Open Source	13	1344	December 2, 2021
Lookup table didn’t work for RegexFeaturizer + DIETClassifier Rasa Open Source	20	1960	February 4, 2022
Lookup table does not work Rasa Open Source	5	2463	January 26, 2022

Pattern for lookup tables

lookup:food

lookup:food

Related topics