it is not really clear how to construct a look up table. In my case it seems overfitting such that only entities are extracted if they are inside the lookup table!
It is actually also not clear how this table works. As far as I understand it trains a feature which indicates WHEN to check with the table?
Is it important to use the low feature for the entity such that the algo learns what is the domain? I don’t use it, maybe that is a problem?
How does this feature takes care about if the table is narrow according to one specific domain when it is all about training whether cheking table or not?
hey @datistiquo, i have experimented the lookup table in my restaurant bot, where i wanted to extract city names & cuisines, so this is the format which i did :
@datistiquo how many entities are in your lookup table? and how many examples of other entities do you have in your training data? And is the overfitting problem happening that it’s not picking up other entitiy types anymore or that it’s not picking up entities of the type the lookup table is specifying?
It is just one entity class. And with overfitting I mean that only an entity is recognized if it is in the table (so pattern feature is overused) as I already said in my initial post.
Many…But I thought that is the sense? Like if you have various street names. That is typical a lot. I thought I shall play with the number of examples from the table inside training data? In my understanding this influences how strong the pattern is learned. Why should the number of entries play a role?
Because it will overfit eventually, but having a lot of entries is fine. But why do you have some entries in your training data that aren’t in the lookup table then? They should also be in the lookup table
I think we missunderstand each other?! Because you write this in the docs: put some examples form table in training data… That is the point of using the Table not putting in all from the table…
I use a Lookup table with product names like 200 of them. For training I use right now 6 values in training data from this lookup table. Now, it seems that it overfits in the sense that the pattern of the table has a strong impact such that now only entity values are recognised if they are only in the table. Before, without the table it recognised correctly arbitrary names as entity values.
Maybe those 6 values in training data from the table are too much because I also rather have only 5 values at all for training (because I don’t use the ‘low’ feature so I dont need much entity examples).
Ohh so do you mean that it doesn’t recognise values that are neither in your training data nor in the lookup table? As in it doesn’t generalise anymore?
Is there any real difference in using regex patterns and lookup table? I also can put the words from lookup table in the regex pattern format?
Is both technically the same?