What is the best or the “Rasa” way to recognize diverse entity types independent of any intents?
Let me use this shopping list bot example provided on Github. The bot is only able to identify a few limited food items (eggs, milk, and butter). But what would be the intended way to empower the NLU to recognize a wide range of food items, e.g. by utilizing a large food / recipe dataset containing thousands of ingredients?
So far I have these ideas, but don’t know which one would be the most appropriate:
- Just adding many examples to the NLU training data
- Adding a new entity on an existing spacy model
- Using lookup tables
My thoughts so far:
- Seems inappropriate since I need to train entities and intent simultaneously. But I don’t want to oversample an intent just to make sure I create a phrase variant with every ingredient
- Is typically not recommended because you need to be very careful to avoid “catastrophic forgetting” problem and gather enough training samples
- Could be the way to go, but I’d rather have a “word embedding” approach that picks up similar tokens. It would be hard to make sure every reasonable ingredient is in this large, static list of food items
Also: How would one deal with entity types. Should I only use on “food” entity type or multiple entity types such as “fruits”, “vegetables”, …?
Let’s say I want to build a chatbot that can identify any kind of ingredient and find me a recipe for it. What is the best practice for such a scenario? How can I make sure my bot identifies new, unknown ingredients as well as very common food items?