What is the best or the “Rasa” way to recognize diverse entity types independent of any intents?
Let me use this shopping list bot example provided on Github. The bot is only able to identify a few limited food items (eggs, milk, and butter). But what would be the intended way to empower the NLU to recognize a wide range of food items, e.g. by utilizing a large food / recipe dataset containing thousands of ingredients?
So far I have these ideas, but don’t know which one would be the most appropriate:
Just adding many examples to the NLU training data
Adding a new entity on an existing spacy model
Using lookup tables
My thoughts so far:
Seems inappropriate since I need to train entities and intent simultaneously. But I don’t want to oversample an intent just to make sure I create a phrase variant with every ingredient
Is typically not recommended because you need to be very careful to avoid “catastrophic forgetting” problem and gather enough training samples
Could be the way to go, but I’d rather have a “word embedding” approach that picks up similar tokens. It would be hard to make sure every reasonable ingredient is in this large, static list of food items
Also: How would one deal with entity types. Should I only use on “food” entity type or multiple entity types such as “fruits”, “vegetables”, …?
Let’s say I want to build a chatbot that can identify any kind of ingredient and find me a recipe for it. What is the best practice for such a scenario? How can I make sure my bot identifies new, unknown ingredients as well as very common food items?
I am not 100% sure how to do this using Rasa, but i think you are looking for something more related to a knowledge base.
First of all, what is an unknown ingredient for you? Let’s say you have a recipe dataset for indian cuisine and an unknown ingredient for such a dataset would be ‘tamale’. I am not sure if there is any way you could possibly pick up ‘tamale’ as an ingredient given your dataset does not contain any reference to such a product.
Now in your case, if you are able to create a dataset of recipes that contains all the ingredients possible and you are looking to provide related recipes as such,
simply tokenize your incoming message, use some POS features to identify nouns and/or verbs and look them up in your knowledge base(create an index of all your recipes, searchable for each token - elasticsearch is really good example ) , pick up recipes that contains the ingredients the person is looking for.
now coming to your word embeddings approach, having a cluster of ingredients that are likely similar really varies on what type of dataset you have -
eggs are similar to flour, if your dataset is mostly recipes on how to make cakes.
eggs could also be more closer to chicken in another dataset.
So yes, i suppose you could train a vector model with all related ingredients and find similarity between two tokens
Thanks, that was very helpful already! I think maybe 1) is what I’m looking for. But let me share an example with you.
Assuming I have a knowledge base of recipes. What would be the easiest way for an intent like this:
## intent:agent.find_recipe
- Search for [tomato](ingredient) recipe
- Find recipe containing [garlic](ingredient) for me
- Look up [potatoe](ingredient) recipes
- What can I cook with [olives](ingredient) and [feta](ingredient)?
In that scenario, if I had a complete list of ingredients, I could go for 3). But what if I want the model to work with new ingredients as well. Let’s say the user wants to have a recipe containing ‘tamale’, so he’s asking “What can I cook with tamale?” and I’d just need ‘tamale’ to be recognized as (ingredient) so I could query my knowledge base accordingly.
This may be a super trivial question, but since I’m still new here, I want to avoid spending days on gathering lists full of ingredients if there’s a simpler solution to this, e.g. using the structure of the utterances for the agent.find_recipe intent.
Add the same time, it would be great if the solution would also work for other intents, e.g.
## intent:agent.add_favorite
- Add [apples](ingredient) to favorite food items
- I love dishes with [Chocolate](ingredient)
- ...
I’d really appreciate any examples, links, resources.
Intent Classification : A user’s intent - Search/Add - These are usually a verb, I am searching for a recipe with [ingredients]
Ingredients or Entities are Nouns.
You should definitely use Rasa to train your classifier but if you are unsure of the volume of entities you might have and indeed it is just looking up a knowledge base for a recipe
for your first intent, I’ll have a story something like
* find_recipe
- look_for_recipe
Train your classifier with examples as you have above without tagging your entities.
However create a custom action(look_for_recipe) where you take the incoming message, lookup your knowledge base for any matches. lowercase the sentence first.
Your knowledge base let’s say is a bunch of recipes.
Your second case however looks like a graph instead, where you are contextualizing an ingredient to a particular user.
the same pattern follows, create a custom action, lookup the phrase in your knowledge base, see if any of the tokens in the incoming sentences matches any ingredients or not from your recipes, create a graph relationship= User-> likes -> ingredient
Use the embedding classifier.
Your classifier basically justifies what kind of action to take, either it is simply a search on a recipe dataset or rather personalisation of a given ingredient to a particular user.
You can however filter your lookup based on high level entities such as cuisine list of which is exhaustive and easier to achieve using CRF
Thanks, that’s super interesting! So you wouldn’t even try to extract ingredients directly in Rasa, but let e.g. elasticsearch deal with the queries, correct?
In my particular case, I don’t have my own large knowledge base, but rather I’m using an external API that I do not control. In this example the API endpoint might look something like this: https://api.recipes.com/search?q=
In this case, would you still just pass the entire message to a custom action and then use POS to extract nouns and apply some rule-based filters to “extract” the ingredient? E.g. above, I wouldn’t want to send out a query like search?q=recipe
Would you do it differently in this case?
One more detail about my project is that in this food example, I would have many intents that deal with ingredients, e.g. I might have:
find_recipe(ingredient)
add_favorite(ingredient)
lookup_nutrients(ingredient)
lookup_calories(ingredient)
get_wiki_article(ingredient)
…
All of these intents (potentially dozens) and associated actions would need to be able to identify / work with ingredients.
First rule I would say is research your dataset really well. If it is API that provides you recipes/nutrients/calories, make sure you read well what are the different ingredients they have information on. The list is of course going to be huge but at least you can start at some point.
You can take this at an intent by intent basis as well. However given your information is relying on an external 3rd party API, you should start with some definitive entities for which you are sure to have an answer back.
Now this simplifies your case hence you will need to train your CRF with a definitive list since you are relying on 3rd party data upon which you don’t have any control.
At your side, things you have control on is contextualizing your data. For each user- their preferences/likes/dislikes and ultimately suggesting recipes based on what they have liked before, you could do this with a graph database.
In my opinion, start small and continuously add new ingredients to your list.
Hope this helps though i realise the problem for you now is to manually list the ingredients but without data, you can’t create anything clever either. Since you don’t control the ultimate dataset, any case you or I mentioned above wouldn’t work directly
Maybe i don’t understand the problem correctly. But I don’t know where is the problem. You can train a NER with CRF to detect any entity when you train with similiar sentences such that also tamal is detect. Then you can use your API with the found entity and if there is a respond/match deliver the recipe otherwise utter a default.
Thanks @souvikg10. Any reasoning why you think parsing an “ingredient” entity is not possible / not a good idea?
@datistiquo: That would be my preferred solution. The issue is that so far I haven’t been able to get ner_crf to recognize all different types of ingredient with a good enough accuracy. Sometimes it works, sometimes it doesn’t. I created an intent like the one mentioned above:
## intent:agent.add_favorite
- Add [apples](ingredient) to favorite food items
- I love dishes with [Chocolate](ingredient)
- Please add [tomatoes](ingredient) and [ice cream](ingredient) to favorites
- ...
I feed ~70 examples into this particular intent with ~10 different sentence structures (with 1-3 ingredients, short + long sentences, ~50 different ingredients, etc.), but it’s still not picking up new ingredients that weren’t part of the training sample very well.
Any suggestions? Do I need way more training samples? But then I’m worried this one intent ends up having way more samples than all the others, leading to an unbalanced training dataset.
Happy to hear both of your thoughts and reasonings!
Try some features mentioned above. However I can tell you, this is highly probabilistic and based on the pattern of your training samples, it can generalise to something really horrible.
Add apples to my favourite food items.
If you are using such examples in your training samples and generalise your CRF, you could end up with a generic pattern that says Add ABC to my favourite food items. and it will consider ABC as a food item.
Ingredients are definitive and if you are using CRF; use it with a lookup table.
Yeah I have this issue too. ALthouhgh I have same trainng sentence inside the data with same surrounding words. When testing on different entity it fails…That is really strange since you don’t know the reason. Maybe it is overfitting but how to find out?
Extracting ABC would be totally fine for me. I could then make sure that ABC is a food item using the API.
@souvikg10: When you’re talking about: “generalise the NER patterns in your training sample” do you mean
adding more sample utterances, or
still refering to a lookup table?
If 2., how do lookup tables help with generalising? Aren’t lookup tables just adding the individual items as new regex pattern features to the crf? So all the additional features added through the lookup tables will only help identifying items from the list, right?
If 1., how would you generalise the patterns in your training data? Wouldn’t it be bad if I generated 300 samples for this one intent, whereas all the other intents only have ~20 examples?
Or do you mean I should maybe remove features that are directly based on the token (ingredient) itself from the ner_crf features? E.g. not use “lower”, “suffix”, “prefix” features? I think that could work for this case, though I haven’t tested it. Could imagine something like:
Maybe even adding more features to the before and after token. Then it can’t memorize ingredients itself, but just learn from sentence structure. Not sure if that’s what you @souvikg10 meant though.
And the downside would be that it wouldn’t calculate these features for any other intents either. So I guess that won’t work either.
I was talking about point 2 and yes you should play with pos features of ner crf to generalise pattern where most likely an ingredient can occur in a sentence
From what I understand,based on the number of intents you have related to ingredients, i am sure you can balance your dataset
Yes, that is the point that is why I trained entity examples with an empty intent, but @akelad don’t recommend this. Maybe someone should state a way how you can train “entity independent if intent” like stated in the title?
Yes, I do better if I remove the word features of entity itself and use ngrams of context words instead. But then I see that even a sentences inside the training examples is not detected if tested on same example…Also it is really difficult to find out which examples to ad to data…
Another idea popped in my head would be to have two pipelines ( Two Rasa NLU model/ one doing intent classification and one doing entity recognition ) this way you remove the imbalance. You can use the two output and merge in one, creating a dummy intent for the second pipeline
Quite simply, train two models one with config for classification and another with config for entity extraction
Use two Interpreters to parse the same message, compare and combine. You can do it in a particular state of the conversation as well. However this isn’t an out of the box rasa solution, rather a custom one