Intent classification, intents with and without entities

kormoczi · June 4, 2021, 7:47am

Hi Everybody,

I am still new to Rasa (and the forum as well), but try to explain my problem as clearly as possible, sorry for the long post…

I would like to distinguish intents, where the text are similar, but one of them has entities. I put together a simple example (part of a bigger project):

nlu:

intent: acquaintance examples: |
- Who are you?
- What are you?
intent: boss examples: |
- Who is your boss?
- Who is your master?
- Who is your owner?
- Who is the boss?
intent: famous examples: |
- Who is (PERSON)?
- Who is [](PERSON)?
- Who is [Arnold Schwarzenegger](PERSON)?
- Who is [Michael Jackson](PERSON)?
- Who is [Albert Einstein](PERSON)?

The config is the following:

language: en

pipeline:

name: SpacyNLP model: “en_core_web_lg” case_sensitive: False
name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100 constrain_similarities: true model_confidence: linear_norm
name: SpacyEntityExtractor dimensions: [“PERSON”]
name: EntitySynonymMapper
name: ResponseSelector epochs: 100 retrieval_intent: acquaintance constrain_similarities: true model_confidence: linear_norm
name: ResponseSelector epochs: 100 retrieval_intent: boss constrain_similarities: true model_confidence: linear_norm
name: ResponseSelector epochs: 100 retrieval_intent: famous constrain_similarities: true model_confidence: linear_norm
name: FallbackClassifier threshold: 0.7 ambiguity_threshold: 0.1
name: MemoizationPolicy
name: TEDPolicy max_history: 5 epochs: 100 constrain_similarities: true model_confidence: linear_norm
name: RulePolicy

What I would like to achieve, is the following: if an intent contains a PERSON entity, it should be classified az “famous”, otherwise it can be either “acquaintance” or “boss”. I do not know, if my nlu intent examples are wrong, or the pipeline has problems, or something else, but when I am playing with ‘rasa shell nlu’, I get the following results:

Message #1: “Who was George Washington?” NLU result OK - intent: famous, confidence: 1.0, entity extracted (both by DIETClassifier and SpacyEntityExtractor - I know this is not really recommended this way…)

Message #2: “Who is Elsa?” NLU result OK - intent: famous, confidence: 0.84, entity extracted (by Spacy)

Message #3: “Who is Mozart?” NLU result not really ok - intent: nlu_fallback, entity extracted (by Spacy) (intent famous confidence: 0.69 - not that bad, but still, do not really understand, why this is the result)

Message #4: “Who is Freddie Mercury?” NLU result BAD - intent: boss, confidence: 1.0, entity not extracted This is not good, but I can accept, if there is no entity, the classification can go wrong, but can I do something here?

Message #5: “Who is Steve Buscemi?” NLU result VERY BAD - intent: boss, confidence: 0.74, entity extracted (by Spacy), intent famous confidence is 0.25 This I cannot understand at all. We have an entity extracted, why it cannot help classify the intent better?

So what am I doing wrong, what shall I do?

Thank you and best regards, Csaba

harloc · June 4, 2021, 12:27pm

Are all these names you mentioned in your examples part of your training data?

The point is, that you use the CountVectorsFeaturizer, so the Rasa NLU AI does not have information from the outside world like a pretrained embedding. It just learns from your examples. So if the AI never encountered names like Freddie Mercury or Seve Buscemi it just cannot accurately handle them.

So you have to options:

Greatly extand your examples, so that most famous names will be encountered and the AI learns all these names that way
use some pretrained embedding, based on wikipedia or something comparable, so these names will “make more sense” to the AI. You still might have to increase the number of examples slightly like in option 1

Hope that will help you.

kormoczi · June 4, 2021, 1:15pm

Most probably I cannot expand my examples that large, so that is why I use SpacyEntityExtractor for the names (“PERSON”). And I think it is clear from the NLU results, that Spacy was able to identify “Steve Buscemi” as a name (“PERSON”). But still the intent classification is totally off…

harloc · June 11, 2021, 6:51am

Intent classification and entity extraction are not intertwined. The entities extracted have no influence on the intent classification. Maybee you can tweak your pipeline and remove potential countervectorfeaturizers and so on.

darshanpv · February 5, 2022, 10:32pm

You can get the complete intent classification and entity extraction engine using rasa.

Topic		Replies	Views
Named Entity Mentions as they relate to Intents Rasa Open Source	6	1999	December 18, 2019
Setting intent if spacy entity is recognized Rasa Open Source	3	1380	November 6, 2018
No entity values by Rasa NLU - "entities": [] Rasa Open Source	4	1963	October 9, 2019
Intent classification and entity recognition Rasa Open Source	0	464	August 13, 2022
Intent based entity extraction Rasa Open Source	1	542	December 23, 2019

Intent classification, intents with and without entities

Related topics