I have trained a bot to recognize an intent
show_me_X, with examples like:
hello, show me [dresses](product), please can you show me [skirts](product)?
Now, when I feed to the bot with a sentence similar to the examples above, but where the product is a completely unknown word for the bot (e.g., show me cars or show me drones), the result is not always good:
- In some cases the bot predicts the
nlu_fallbackintent and extracts the unknown-word as
- In other cases the bot predicts the
show_me_Xintent and extracts the unknown-word as
The second case seems to happen when the unknown word has the same root as one of the known products (i.e., like drone and dress). I wonder if this is due to using a char-based
CountVectorsFeaturizer, although the docs says it is only used for intent classification and response selection:
pipeline: - name: WhitespaceTokenizer - name: RegexFeaturizer - name: LexicalSyntacticFeaturizer - name: CountVectorsFeaturizer - name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4 - name: DIETClassifier epochs: 70 use_masked_language_model: True - name: FallbackClassifier threshold: 0.7 - name: EntitySynonymMapper
If the problem is the featurizer, I could remove it but I would prefer not, since it helps to understand slight variations of words that are not entities. So, how could I prevent my bot from extracting a
product entity when the user did not provide a known product? Would it help combining several entity extractors?