Unknown word extracted as entity

humcasma · February 5, 2021, 5:33pm

I have trained a bot to recognize an intent show_me_X, with examples like:

hello, show me [dresses](product), please
can you show me [skirts](product)?

Now, when I feed to the bot with a sentence similar to the examples above, but where the product is a completely unknown word for the bot (e.g., show me cars or show me drones), the result is not always good:

In some cases the bot predicts the nlu_fallback intent and extracts the unknown-word as product entity.
In other cases the bot predicts the show_me_X intent and extracts the unknown-word as product entity.

The second case seems to happen when the unknown word has the same root as one of the known products (i.e., like drone and dress). I wonder if this is due to using a char-based CountVectorsFeaturizer, although the docs says it is only used for intent classification and response selection:

pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 70
  use_masked_language_model: True
- name: FallbackClassifier
  threshold: 0.7  
- name: EntitySynonymMapper

If the problem is the featurizer, I could remove it but I would prefer not, since it helps to understand slight variations of words that are not entities. So, how could I prevent my bot from extracting a product entity when the user did not provide a known product? Would it help combining several entity extractors?

m.vielkind · February 5, 2021, 6:59pm

Hi @humcasma how many training examples do you have in your NLU data for your product entity?

humcasma · February 7, 2021, 5:23pm

Hi @m.vielkind As part of my test I am using a training data generator. Since I have quite a few products, quite a few entities and quite a few ways of expressing the intent, I am generating around 1000 training examples.

Topic		Replies	Views
Extracting Entities from Intents Rasa Open Source	2	290	May 2, 2024
How to make a bot which detects product names from the nlu? Rasa Open Source	3	336	April 6, 2022
Can someone explain the RegexFeaturizer/RegexEntityExtractor for me? Rasa Open Source	6	915	December 16, 2022
Rasa NLU training data Rasa Open Source	1	798	October 17, 2018
Some cases are not match with similar examples and entity Rasa Open Source	6	370	April 13, 2021

Unknown word extracted as entity

Related topics