Reduce overfitting with Lookup Table

Hey,

it is not really clear how to construct a look up table. In my case it seems overfitting such that only entities are extracted if they are inside the lookup table!

It is actually also not clear how this table works. As far as I understand it trains a feature which indicates WHEN to check with the table?

Is it important to use the low feature for the entity such that the algo learns what is the domain? I don’t use it, maybe that is a problem?

I read Entity extraction with the new lookup table feature in Rasa NLU

How does this feature takes care about if the table is narrow according to one specific domain when it is all about training whether cheking table or not?

Any advice and explanations? :slight_smile:

Did anyone experimented with Lookup table or using it?

hey @datistiquo, i have experimented the lookup table in my restaurant bot, where i wanted to extract city names & cuisines, so this is the format which i did :

Lookup table definition

lookup1

1

cities.txt

cuisines.txt

How should this help with the staed issue of overfitting?

@datistiquo how many entities are in your lookup table? and how many examples of other entities do you have in your training data? And is the overfitting problem happening that it’s not picking up other entitiy types anymore or that it’s not picking up entities of the type the lookup table is specifying?

It is just one entity class. And with overfitting I mean that only an entity is recognized if it is in the table (so pattern feature is overused) as I already said in my initial post.

And how many entries are in your lookup table? That is kind of expected if you’ve got a lot of entries in there

Many…But I thought that is the sense? Like if you have various street names. That is typical a lot. I thought I shall play with the number of examples from the table inside training data? In my understanding this influences how strong the pattern is learned. Why should the number of entries play a role?

Because it will overfit eventually, but having a lot of entries is fine. But why do you have some entries in your training data that aren’t in the lookup table then? They should also be in the lookup table

I think we missunderstand each other?! Because you write this in the docs: put some examples form table in training data… That is the point of using the Table not putting in all from the table…

That sounds weird and contradictory.

Did I wote my problem not precisly or why don’t you understand me obvisously? :smile:

Yes it seems you’ve not described your problem clearly enough, I’m not sure what the issue is you’re having anymore

Ok I try again.

I use a Lookup table with product names like 200 of them. For training I use right now 6 values in training data from this lookup table. Now, it seems that it overfits in the sense that the pattern of the table has a strong impact such that now only entity values are recognised if they are only in the table. Before, without the table it recognised correctly arbitrary names as entity values.

Maybe those 6 values in training data from the table are too much because I also rather have only 5 values at all for training (because I don’t use the ‘low’ feature so I dont need much entity examples).

Ohh so do you mean that it doesn’t recognise values that are neither in your training data nor in the lookup table? As in it doesn’t generalise anymore?

yes!

BTW:

Is there any real difference in using regex patterns and lookup table? I also can put the words from lookup table in the regex pattern format? Is both technically the same?

I want to modify lookup table with n-grams like here Entity extraction with the new lookup table feature in Rasa NLU

and I asked myself if I shall do this better with n-grams as a regex pattern instead of modifing the use of lookup table. Just simple as that?

Hey,

could you expalin the following why the pattern feature for my table is both same amount but negative predicting the offset and entity label?

0:pattern:entity label: O weight: -0.994555 
0:pattern:Leistung label: entity weigth: 0.994555

Since this is zero sum I don’t know why the lookup table improves the results at all (it does)

My config for NER:

- name: "ner_crf"
  "BILOU_flag": False,
  "features": [
            ["prefix5","suffix3"
            ],
            ["pattern"],
            ["prefix5","suffix3"]]

Yeah you can also put it in regex format, they work pretty much the same way

Thanks. What about the issue about overfitting and maybe the rest of above posts? :slight_smile: