Lookup tables and entity training

Hej, I have a few questions about entity extraction and the use of lookup tables:

I read in the documentation and other forum posts, that we need to provide some examples in intents for entities that are defined in a lookup table. But there is never specified how much examples and how the number of examples scales with the length of the lookup table. For example if I have a lookup table with length n and I need ~3 examples for rasa to recognise the entities, does a lookup table with length 100n need ~300 examples?

Furthermore I am currently working on a students project that compares the capabilities of rasa and IBM Watson. In IBM Watson there is the possibility to provide an entity name and a list of values for entity training, without providing labeled examples within the intents. Is there an approach to achieve something similar in rasa?

Thanks in advance :slight_smile: !

For example if I have a lookup table with length n and I need ~3 examples for rasa to recognise the entities, does a lookup table with length 100n need ~300 examples?

You can 1st define your sample intents and entities using the Writing Conversation Data steps. Once done, you do this -

To train the nlu model, you can just run the following command: rasa train nlu

You can test the model by running an interactive shell mode via the following command: rasa shell nlu

Then you can see the score predicted against the intent and the extracted entities. That will give you the ability to judge if the intent samples you provided in nlu.md are enough or not.

See this for reference : A Beginner’s Guide to Rasa NLU for Intent Classification and Named-entity Recognition | by Ng Wai Foong | Towards Data Science

Give it a shot and see ?

2 Likes

Hey @Firefloat, not sure if you have seen it yet, but check out this blogpost on lookup tables: Entity extraction with the new lookup table feature in Rasa NLU | The Rasa Blog | Rasa

In IBM Watson there is the possibility to provide an entity name and a list of values for entity training, without providing labeled examples within the intents. Is there an approach to achieve something similar in rasa?

Lookup tables are meant to be a way of improving the models by using more regular expressions to featurize intents and entities. They are not a list of entites per se, i.e. just because you included it in the lookup table list does not mean it will get picked up as that entity everywhere. This is intentional because whether or not a word is picked up as an entity should not only depend on the word itself, but also the sentence context. As described in the blog post:

Rather than directly returning matches, these lookup tables work by marking tokens in the training data to indicate whether they’ve been matched. This provides an extra set of features to the conditional random field entity extractor ( ner_crf ) This lets you identify entities that haven’t been seen in the training data and also eliminates the need for any post-processing of the results.

3 Likes

Thank you both! :slight_smile: I’ll try your suggestions