Pattern for lookup tables

I’m using lookup tables, but I’m been a bit confused about two related things:

  • where in the logs can I check whether my lookup tables were correctly mapped to desired entity? (assuming no directory erros)
  • what is the(s) pattern(s) to follow on mentioning them?

In the rasa-demo project, we have the following example, in despite of having the entities location and product (not products) defined on domain:

## regex:greet
- hey[^\s]*

## regex:zipcode
- [0-9]{5}

## lookup:location.txt
data/nlu/lookups/location.txt

## lookup:products.txt
data/nlu/lookups/products.txt

In the docs, we have this:

## lookup:plates
data/test/lookup_tables/plates.txt

## intent:food_request
- I'd like beef [tacos](plates) and a [burrito](plates)
- How about some [mapo tofu](plates)

Futhermore, with rasa=1.10.0 and rasax=0.28.5, I’m experiencing that rasa x fixes the format to mach the case of rasa-demo project.

1 Like

Hi @lgabs, if you have the RegexFeaturizer in your pipeline and the assistant is training without any errors, the lookup tables are used. We don’t log if they are successfully used, we only log is something went wrong.

Regarding the usage: As pointed out in our documentation you need to include some examples in your training data that use an entry from the lookup table. The entries in the lookup tables are converted into features. Let’s take a look at an example:

“Hello my name is Tanja.”

Assume you have a lookup table for names and that one includes the word “Tanja”. What the RegexFeaturizer is then doing, it looks if it can find an entry from the lookup table in the text. If it finds a match, it sets a feature that basically says that “Tanja” is listed in a lookup table. The features are then passed on to the model, for example, DIETClassifier, and the model is then hopefully learning the correlation between “Tanja” is listed in the lookup table and “Tanja” being an entity of type name. If you don’t include any examples in the training data that contain an entry of the lookup table, the model is not able to learn this correlation as it will not be present. So it is important that you include some entires from your lookup table in the training data and mark them as entity.

1 Like

Thanks for the reply and explanation! I’m including some examples in trainning data indeed, and I’m getting good results so far, except for names, which is very hard :slight_smile: I was afraid of using the wrong pattern on mentioning them as above, and specially because rasa X reformats the pattern to match ##lookup:<entity_name>.txt or csv, depending on the file, and on rasa demo we have a product entity but in the nlu lookup mention is said products.txt.

Could be that our examples are not up-to-date. Thanks for pointing that out, will have a look.

1 Like

Hello,

It looks like I have the same problem.

Using the RegexFeaturizer and Diet gives really bad results for my Entities in lookup tables. However, I have a lot of training sentences containing examples that are in the lookup tables (more than 200) the entities that are not specified in training sentences are mostly not recognized.

I tried to open a topic about that (here) but never had a relevant answer.

I can share with you 200 tests comparing RegexFeaturizer + CRFEntityExtractor with RegexFeaturizer + DIETClassifier

Hi @PaulB!

I just read through the thread you shared. Just wanted to make clear that DIETClassifier and CRFEntityExtractor do not influence each other. So if CRFEntityExtracotr comes first in the pipeline, the entities detected by that component are not used inside DIETClassifier.

As mentioned above the RegexFeaturizer converts the lookup tables into features that are then used by the components later in the pipeline. So if you are using the DIETClassifier without entity recognition, but you have a RegexFeaturizer in the pipeline and defined some lookup tables, those features will also be used for the intent classification in the DIETClassifier. However, in our experience those features are normally not that important for intent classification.

Regarding your problem around bad performance for the DIETClassifier on entities: It is hard to tell what might be going wrong without looking at the data and some evaluation metrics. How many entities do you have? What entities do you have? How many examples do you have per entity in your training data? Is it only the DIETClassifier that performs bad or also the CRFEntityExtractor?

Hi Tanja,

Thank you for your response.

How many entities do you have?

I have 3 entities ( GivenName, Lastname, Service) that are linked to lookuptables with in average 150 examples in it

How many examples do you have per entity in your training data?

In my data.json :

  • GivenName :50
  • LastName : 150
  • Service : 200

Is it only the DIETClassifier that performs bad or also the CRFEntityExtractor ?

CRFEntityExtractor performs well. DIETClassifier works well with intent detection, but is terrible with entity extraction (particularly the ones that are in the lookup and not in the data.json)

when i train rasa nlu, i am getting c:\users\isha\anaconda3\envs\installingrasa\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py:79: FutureWarning: Directly including lookup tables as a list is deprecated since Rasa 1.6. regex_pattern = self._generate_lookup_regex(table)

I have a lookup table which i have mentioned this way in nlu.md

lookup:food

  • data/food_list.txt

Some examples from it i have included in the training.

Also CRFEntityExtractor and Dietclassifier both are there but entity_recognition: False for dietclassifier. So want CRFEntityExtractor to refer the lookup list. i also have the required RegexFeaturizer in my config.

Still lookup table is not being referred. examples which are present in lookup table but not in training are not being extracted.

My issue i was able to identify and resolve. May be this might help others who are facing difficulty with lookups. in nlu we shuldnt put a hyphen in front of the file path for lookup. regexfeaturizer then identifies that as a list. only mention the path without hypen like so:

lookup:food

data/food_list.txt

1 Like