Lookup Table or Multiple Examples?

I am facing a problem with entity extraction for a entity “Company_Name”.

One variant of example : what are the current assets for Alpha Steels company.

Alpha Steels is a company name. there would be a hundred company names in my list.

shall i add examples for all the companies or lookup would help me out to solve this issue? and Do i need to add any options to the ner_crf pipeline.

I just wanted to know before implementing lookup.

this depends, do you really have a fixed list of possible companies? Or is it any company in the world? If it’s a fixed list, then a lookup table could work. You still need to provide some examples of sentences with these entities in them though. Otherwise without the lookup table you can provide a bunch of training examples and eventually the ner_crf will learn to generalise

Suppose I have fixed list of 1000 companies. Do I have to use each one of them in the training example? If yes, then what purpose really is the lookup table feature serving?

For my use case I have only 1 intent and 18 entities. 12 entities out of all have >500 “fixed” list of values. I have more than 500 patterns of input statement. I am using Chatito to generate the training data and if I use all the values of all the entities, the training set will be humongous and so will the training time. I was really expecting lookup feature would solve this problem but it doesn’t seem to.

Could you please tell if the lookup feature requires something else along with it which I might be missing?

Hi @akelad as @kishanbajaj was asking… do i have to create examples for all the list of companies? If so how the lookup table helps? please answer to the @kishanbajaj question in the comment.

Nope, you don’t have to use each one in your trainin examples, just a few of them. The lookup table will then do that rest. The CRF just needs to learn the pattern of when to extract these entities

Thankyou for the information… :slight_smile:

Could you please elaborate on few? Because it does not seem to work.

I tried with:

100+ patterns, 6 entities and ~100,000 training statements

1 entity had total 5 fixed values and I used 2 in training. Another entity had 9 fixed values and I used 5 in training. For both the entities, RASA could not predict the remaining values which were present along with the trained values in the respective lookup tables.

Has anybody had success with the lookup tables? I would really like to know as it seems one awesome feature but I am having a hard time getting it to give results.

well, 100,000 seems like it’s far too much. your model is probably overfitting to those values now

I’m seeing much the same thing. I have generated about 100 training examples using 5 different values for my entity, then (as a test) I have specified 3 of the 5 and 1 other new value as lookup values. I see matches on the values in the training data, but not for the new value in the lookup list.

what does your pipeline look like?

Thanks for the reply Akela, I have managed to solve this by changing my training data to have a wider selection of entity value examples so there are many fewer duplicates. It was overtraining on the examples I had given, but now picks out place names (in this case) by context rather than by matching to a list. On reflection, I think it makes more sense to work this way and then match them to my list in an action to better handle typos, ambiguities etc

You can try this for names Providing conversation context to the NLU using microervices

Together with a lookup table it worked very well for us in getting first names and last names.

if the company list is not fix then what should i do?