Making rasa nlu to identify similar entities based on trained entities

syedrizvi · September 4, 2018, 5:39am

Hello team,

I am trying to make rasa nlu to identify entities and my pipeline is below

pipeline:

name: “nlp_spacy”
name: “tokenizer_spacy”
name: “intent_featurizer_spacy”
name: “intent_classifier_sklearn”
name: “ner_crf”
name: “ner_synonyms”

I have given sufficient examples to identify chicken and pizza as an entity called as ‘dish’.Now my intention was that if i just define an entity lets say burger as a ‘dish’,include it my data.json file and train the same (without giving all the exhaustive examples for chicken and pizza) it should start picking up the burger in the same context of chicken and pizza.

For eg - If i say that - i am looking for an Italian burger recipe,

it should identify burger as an entity called as dish.But currently it is not doing the same. How can it be achieved so that the chat bot generalizes on entities ? Do we need any tuning in using ner_crf ?

Thanks.

syedrizvi · September 4, 2018, 9:00am

Update - If i give some example data for burger - lets say i gave around 3 examples identifying burger as entity ‘dish’ then it seems to generalize well for burger on other kind of sample conversations.

But my issue still remain - Do i need to provide some some sample conversations for each entity ‘dish’ ? I wanted to it to generalize based on just defining any entity as ‘dish’ and expect it to fit to already trained data where context of entity ‘dish’ is being used.

For eg - I am looking for spicy burger recipe

It should understand that burger is a ‘dish’ because we have trained on similar example for chicken and we have defined burger as a ‘dish’.

Thanks.

syedrizvi · September 5, 2018, 6:08am

Appreciate if someone can put forward their thoughts on this…Thanks.

Sam · September 5, 2018, 3:47pm

Have a look at Chatito

https://rodrigopivi.github.io/Chatito/

You can use this tool to generate more data.

akshay2000 · September 6, 2018, 3:51am

While Chatito can solve the data generation problem, I still think that the original question stands. Must we provide training data for entity extraction with all the different combinations? Is it possible for some component to just pick up new values? Is it based of the word length? For example, if I have sufficient training data for word burger (6 characters) will mutton at least be recognized without extra training data?

syedrizvi · September 6, 2018, 6:06am

Thanks …yeah the issue is not with data generation but with respect to making an entity (i.e. ‘dish’ here) to generalize and making it to work in contexts where the other ‘dish’ values have been trained on.

Sam · September 11, 2018, 9:40am

Then you could try word embedding e.g. word2vec or glove

souvikg10 · September 19, 2018, 7:48pm

Give this a try

syedrizvi · September 24, 2018, 9:13am

Thanks…Will have a look at it !

tahesse · November 5, 2018, 1:36pm

This should actually solve your problem

khaerulumam42 · May 6, 2019, 2:23am

Continuing the discussion from Making rasa nlu to identify similar entities based on trained entities:

ner_crf for entity recognition generate weight based on features such as suffix, prefix, word before and after token. When we provide sufficient enough data for entity recognition using CRF, it will be generalize enough to identify new entity value, like in this case. If you wanna try trainig ner_crf alone you can try sklearn_crf, on there you can tuning parameter like L1 and L2 regularization to best fit model for your data and your purpose.

akshay2000 · May 17, 2019, 10:35am

I recently started using lookup tables. However, I suddenly start getting ill defined f scores from sklearn intent classifier if my lookup table is large. What could be the reason? I have opened a discussion over here.

Topic		Replies	Views
Training new entities independent of intent Rasa Open Source	16	3962	October 9, 2018
Multi token entities in Rasa NLU Rasa Open Source	2	763	August 30, 2018
Cannot get entity extraction to work with Rasa NLU Rasa Open Source	4	2178	October 15, 2019
Rasa_NLU ner_crf classification issue Rasa Open Source	1	498	June 12, 2019
Entity not identified Rasa Open Source	2	1013	March 21, 2019

Making rasa nlu to identify similar entities based on trained entities

Related topics