this depends, do you really have a fixed list of possible companies? Or is it any company in the world? If it’s a fixed list, then a lookup table could work. You still need to provide some examples of sentences with these entities in them though. Otherwise without the lookup table you can provide a bunch of training examples and eventually the ner_crf will learn to generalise
Suppose I have fixed list of 1000 companies. Do I have to use each one of them in the training example? If yes, then what purpose really is the lookup table feature serving?
For my use case I have only 1 intent and 18 entities. 12 entities out of all have >500 “fixed” list of values. I have more than 500 patterns of input statement. I am using Chatito to generate the training data and if I use all the values of all the entities, the training set will be humongous and so will the training time. I was really expecting lookup feature would solve this problem but it doesn’t seem to.
Could you please tell if the lookup feature requires something else along with it which I might be missing?
Hi @akelad as @kishanbajaj was asking… do i have to create examples for all the list of companies? If so how the lookup table helps? please answer to the @kishanbajaj question in the comment.
Nope, you don’t have to use each one in your trainin examples, just a few of them. The lookup table will then do that rest. The CRF just needs to learn the pattern of when to extract these entities
Could you please elaborate on few? Because it does not seem to work.
I tried with:
100+ patterns, 6 entities and ~100,000 training statements
1 entity had total 5 fixed values and I used 2 in training. Another entity had 9 fixed values and I used 5 in training. For both the entities, RASA could not predict the remaining values which were present along with the trained values in the respective lookup tables.
Has anybody had success with the lookup tables? I would really like to know as it seems one awesome feature but I am having a hard time getting it to give results.
I’m seeing much the same thing. I have generated about 100 training examples using 5 different values for my entity, then (as a test) I have specified 3 of the 5 and 1 other new value as lookup values. I see matches on the values in the training data, but not for the new value in the lookup list.
Thanks for the reply Akela, I have managed to solve this by changing my training data to have a wider selection of entity value examples so there are many fewer duplicates. It was overtraining on the examples I had given, but now picks out place names (in this case) by context rather than by matching to a list. On reflection, I think it makes more sense to work this way and then match them to my list in an action to better handle typos, ambiguities etc