Intent and Entity Design - Organization Dilemma

Hi everyone -

So I’ve been back and forth for awhile with a design consideration regarding the organization of intents and entities in my chatbot. I thought this would be a cool topic to post and to open a discussion about in case anyone else has encountered this dilemma. I think proper intent distinction and organization is key in building a good, scalable chatbot.

I will use my situation as an example, and hopefully everyone can relate it back to their own use cases. One component of the bot I am building will be to search specific foods, cuisines, and drinks, with the ultimate goal of the chatbot being to recommend a restaurant or bar/lounge/pub/etc. for the user to get the requested thing. For example, if the user says “I want to get a pizza”, it would recommend a restaurant that has pizza on the menu. Easy enough!

The dilemma I have been having is how to organize the intents effectively to put as little strain on the NLU as possible, while making sure to fulfill the request at hand. Originally, I had two separate intents: search_food and search_drink. The reason for separating these initially was to ensure that if the user said something generic like “I’m hungry” or “I want to party”, they could be associated with search_food and search_drink accordingly.

The problem I am facing now is that the NLU doesn’t see a difference between “burger” and “beer” unless every food and drink is tagged as a food or drink entity in the training. So for example, if “whiskey” isn’t tagged in the training, it may simply think “whiskey” is a food if the user says “I want a whiskey” because there is no difference between that and “I want a burger” besides the entity. Just to be clear, I didn’t expect the NLU to pick up on this difference - it just made me realize that I may need to revisit the way I’m organizing intents.

Assuming I combine into one intent, search_food_drink, I just now have to consider generic requests. Right now, my thought is to just make keywords like “hungry” and “starving” synonyms of “eat” so with a little bit of logic, the backend can understand to recommend a restaurant. On the other side, I would make things like “drunk” a synonym for “drink” to recommend a bar, lounge, etc. I just don’t know if this would be considered a good practice or if it is a stretch for synonyms!

I originally thought I could have some sort of “feeling” or “desire” intent with things like “I’m hungry” or “I want to grab a drink”, but I imagine that getting messy and really confusing to handle.

Any thoughts / similar encounters? I’m interested to hear how others have tackled situations like this regarding the scope of intents.

Thanks! - Chris

1 Like

Chris, I’m not sure I understood what you’re saying completely, but I think you’re facing a common dilemma. The NLU can classify intents and the entities in them. The more granular the classification, the more training data for each particular case you need to provide. If you do that all your intelligence is in the NLU/AI and what feeds that intelligence is the training data for the NLU and AI (Rasa_Core via stories).
Or you can have a generic intent, like search_food_drnik and push all the intelligence and intent classification/processing into your backend via custom actions.

Both have drawbacks. If you use the custom actions for the backend and push all of the intelligence there, you’re kind of missing the point of Rasa and AI.

On the other hand, if you use a different intent/entity for every variation of a question, and a specific “utter_action” as a response, that doesn’t scale either as you end up with dozens of intents and utter_actions.

Thanks for the feedback, Leo. That’s my challenge to an extent is finding that balance.

Also, I think it boils down to determining if search_food and search_drink are too similar of intents.


  • I want pizza
  • I’m craving a steak
  • I’m in the mood for Chinese food
  • Where can I get swordfish


  • I could go for a martini
  • I want a beer
  • Where can I get a margarita?
  • I’m in the mood for a scotch

The way you ask for both is clearly very similar, which leads me to question if they belong in one intent together.

The only problem is, then you have generic inquiries like…


  • I’m hungry
  • I need to eat
  • Where can I grab a bite to eat?


  • I want to drink
  • Where can I grab a drink?
  • I want to get drunk

… which seem like they’d need their own intents to be handled properly.

Do we need to add different intent for every entity to get or is it fine to go with one intent for different entities.??

I am also facing the same confusion… My Use case is having 3 entities.

The user can give any entity in any manner from his sentences.

is it fine to add all the entities under one intent while training my NLU ?? or i need to create different intents for each entity??

I believe it is optimal to have one intent. You do not need to have separate intents for every entity. If they are related in context, I would say to trigger them via one intent. For my example above, I ended up using one intent for food and for drink because from an NLU perspective, they are identical (just different entities to your point).

1 Like

I think entity extraction is independent of intent. So as long as you provide enough training data for ner_crf to pick out the entity you want, it doesn’t make a difference which intent it is. So you could have intents like ask_food, ask_drink, ask_wifi. And in each one you say “my name is Jim, I want $variable” (the variable would be different for different intents), but ner_crf should be able to pick out “Jim” as name. Try it

Good point, @lgrinberg. This is something I am also trying to understand a bit deeper. I have a bot with about 400 intents, with about 120K examples. Most of the times, I notice entities are getting picked correctly, but not the intents. Cant really figure how this is working. Do you think, should focus more on entities and design intents around them? I have been thinking other way around. Thank you

You could also have a single intent search and provide examples with different entities type_of_food, type_of_drink and so on, and then act differently in your stories depending which slots are set.

A bit too vague at this point. Why do you have 400 intents? Are you extracting a different entity from every intent (i.e. 400 intents == 400 entities)? I’d imagine that it’s hard for an NLU model to identify each one of the 400 intents correctly.
Can you please give more details?

I have about 200+ entities - e.g. “savings”, “balance”, “interestRate”, “apply” etc… And about 400 intents - based on the combinations - e.g. “SavingsAccountInterestRate”, “SavingsAccountHowToApply”, “SavingsAccountBalance”, etc… Savings is just one type of account. If I have 5 types of accounts, I might end up having 5x intents. So, in short, I use a set / combination of entities to arrive / understand the intent.

Hey @Egalite123, I was facing a similar challenge recently. I needed to give the NLU to the capacity to recognize similar yet hierarquical intents classification. Just as you were saying, intents “SavingsAccountInterestRate” and “SavingsAccountHowToApply” might share a few words and context, so they are ‘alike’. The pipeline component that Rasa has designed for this was intent-classifier-tensorflow-embedding. Basically what it does is: breaks down the input space (your text-entry and its features) and the label space (the intent-labels, “SavingsAccount…”) on the same space and tries to find similarities between those. And if you split your labels in a hierarchical manner, by setting "intent_split_symbol": "+", the architecture might pick up those hierarchical patterns. In your example, by labeling them savings+account+interestrate and savings+account+howtoapply, denoting a 3 level classification structure.

What you can do (as I did and worked) is to label your intents regardless of which entities you are trying to extract. That is: your intents should be as general in regards to its entities. Say you have 3 differents types of accounts so you have an intent that looks like: “SavingsAccountMoreDetail”. A few examples for that intent might be

"Yes I wanna know more about [APRO](entity: account-type)"
"Yes more info on [APAG](entity: account-type)"
 "More details please"

In this way, you can then write your stories based on which entites NLU have picked up:

## story a
* SavingsAccountMoreDetail{"account-type": "APRO"}
  - utter_detail_apro

## story b
* SavingsAccountMoreDetail{"account-type": "APAG"}
  - utter_detail_apag

Or, depending on how many entities you have, this might be to lenghty and you can generalize on an Action that does this classification for you, e.g.

* SavingsAccountMoreDetail{"account-type": "APRO"}
  - action_classify_account
  - slot_set{"account.type": "free"}
  - utter_free_type

## story b
* SavingsAccountMoreDetail{"account-type": "APAG"}
  - action_classify_account
  - slot_set{"account.type": "paid"}
  - utter_free_type

And there’s also a third way to do these that is by creating custom-slots but it’s overhead in this case because by creating a account.type as a categorical slot, each category is hot-encoded and thus have its own “feature axis” direction.

The general idea you must have is that entities extractions are a different process than of intent classification. They are run by different components and you can think that one process does not impacts the other**. It also helps to think on how you are dealing with decision taking. Designing dialog flows also helps tremendously on this classification phase.

** That’s not entirely “true” because on intent classification entities (as entity objects) are not used as inputs. But what is used is a featurized representation of your input text and in that input text you do have your entity broken down just like any other word.


This would be great, but the only problem is that from a design perspective, if someone says “I’m hungry”, we don’t yet have a slot filled and thus can’t know to ask a proper clarifying question with a single intent such as “What type of food are you looking for?” This was a big part of the dilemma where things get complicated when we think about the language the bot will say.

@Egalite123 Are you working on this for a banking institution? I’m in the same space, would be interesting to connect and share ideas.

@Migs86 @Egalite123 So am I. Would be extremely interesting to share ideas, I’m stuck on the same problem.

I have come up with a solution that is a bit counter-intuitive but works well for me. Group the examples that are similar a common intent. Let’s call it search_food_drinks. Now, add the remaining examples that are sufficiently different from each other to separate intents search_food and search_drink. Since entity extraction is independent of intent classification, it will remain unaffected. On the other hand, “I am hungry” and “I want a drink” can be classified easily.

I would also suggest running the evaluation script and taking a look at confusing matrix. That matrix is a clear indicator of which intent is being confused for what.

That is a lot of data. However, I have found confusion matrix insanely useful in such scenarios. It will help you figure out why intents are being confused for other intents.

Agree. Confusion matrix is extremely useful. There are some instances, which really confuses me. Eg. - “Tell me about @[current] @[account]” gets correctly classified. Where as “I want to open @[current] @[account]” variation does not get classified (intent), where as entities are picked up correctly in both the cases. Confusion matrix does point out the 2nd utterance are wrongly classified. But, not sure how to go out. Later I found out the word “want” seems to be the culprit. Without that word, it shows perfect classification. But, then the issue remains if the user actually uses “want” in his utterance.