Reduce overfitting with Lookup Table

datistiquo · November 30, 2018, 1:34pm

Hey,

it is not really clear how to construct a look up table. In my case it seems overfitting such that only entities are extracted if they are inside the lookup table!

It is actually also not clear how this table works. As far as I understand it trains a feature which indicates WHEN to check with the table?

Is it important to use the low feature for the entity such that the algo learns what is the domain? I don’t use it, maybe that is a problem?

I read Entity extraction with the new lookup table feature in Rasa NLU

How does this feature takes care about if the table is narrow according to one specific domain when it is all about training whether cheking table or not?

Any advice and explanations?

datistiquo · December 12, 2018, 3:43pm

Did anyone experimented with Lookup table or using it?

JiteshGaikwad · December 12, 2018, 6:06pm

hey @datistiquo, i have experimented the lookup table in my restaurant bot, where i wanted to extract city names & cuisines, so this is the format which i did :

Lookup table definition

lookup1

cities.txt

cuisines.txt

datistiquo · December 12, 2018, 6:12pm

How should this help with the staed issue of overfitting?

akelad · December 20, 2018, 11:16am

@datistiquo how many entities are in your lookup table? and how many examples of other entities do you have in your training data? And is the overfitting problem happening that it’s not picking up other entitiy types anymore or that it’s not picking up entities of the type the lookup table is specifying?

datistiquo · December 21, 2018, 4:52pm

It is just one entity class. And with overfitting I mean that only an entity is recognized if it is in the table (so pattern feature is overused) as I already said in my initial post.

akelad · January 8, 2019, 10:35am

And how many entries are in your lookup table? That is kind of expected if you’ve got a lot of entries in there

datistiquo · January 8, 2019, 6:20pm

Many…But I thought that is the sense? Like if you have various street names. That is typical a lot. I thought I shall play with the number of examples from the table inside training data? In my understanding this influences how strong the pattern is learned. Why should the number of entries play a role?

akelad · January 9, 2019, 9:56am

Because it will overfit eventually, but having a lot of entries is fine. But why do you have some entries in your training data that aren’t in the lookup table then? They should also be in the lookup table

datistiquo · January 9, 2019, 11:49am

I think we missunderstand each other?! Because you write this in the docs: put some examples form table in training data… That is the point of using the Table not putting in all from the table…

datistiquo · January 9, 2019, 11:51am

That sounds weird and contradictory.

Did I wote my problem not precisly or why don’t you understand me obvisously?

akelad · January 9, 2019, 12:03pm

Yes it seems you’ve not described your problem clearly enough, I’m not sure what the issue is you’re having anymore

datistiquo · January 9, 2019, 12:12pm

Ok I try again.

I use a Lookup table with product names like 200 of them. For training I use right now 6 values in training data from this lookup table. Now, it seems that it overfits in the sense that the pattern of the table has a strong impact such that now only entity values are recognised if they are only in the table. Before, without the table it recognised correctly arbitrary names as entity values.

Maybe those 6 values in training data from the table are too much because I also rather have only 5 values at all for training (because I don’t use the ‘low’ feature so I dont need much entity examples).

akelad · January 10, 2019, 2:54pm

Ohh so do you mean that it doesn’t recognise values that are neither in your training data nor in the lookup table? As in it doesn’t generalise anymore?

datistiquo · January 10, 2019, 7:16pm

yes!

BTW:

Is there any real difference in using regex patterns and lookup table? I also can put the words from lookup table in the regex pattern format? Is both technically the same?

I want to modify lookup table with n-grams like here Entity extraction with the new lookup table feature in Rasa NLU

and I asked myself if I shall do this better with n-grams as a regex pattern instead of modifing the use of lookup table. Just simple as that?

datistiquo · January 11, 2019, 2:00pm

Hey,

could you expalin the following why the pattern feature for my table is both same amount but negative predicting the offset and entity label?

0:pattern:entity label: O weight: -0.994555 
0:pattern:Leistung label: entity weigth: 0.994555

Since this is zero sum I don’t know why the lookup table improves the results at all (it does)

My config for NER:

- name: "ner_crf"
  "BILOU_flag": False,
  "features": [
            ["prefix5","suffix3"
            ],
            ["pattern"],
            ["prefix5","suffix3"]]

akelad · January 17, 2019, 5:00pm

Yeah you can also put it in regex format, they work pretty much the same way

datistiquo · January 18, 2019, 11:01am

Thanks. What about the issue about overfitting and maybe the rest of above posts?

Topic		Replies	Views
How to use lookup tables for entity list Rasa Open Source	1	898	March 9, 2020
Question on lookup tables Rasa Open Source	5	297	November 26, 2021
How does the lookup table in rasa_nlu work? Is there something similar to keyword_intent_classifier for entity extractors? Rasa Open Source	6	5398	August 13, 2021
Lookup Table or Multiple Examples? Rasa Open Source	12	3547	December 18, 2023
Lookup table is supposed to classify entities, but does it influence intent prediction? Rasa Open Source	4	1127	April 15, 2021

Reduce overfitting with Lookup Table

Related topics