A test user purposely entered the nonsense phrase: “soap on a rope.” I would have preferred that it be classified as out of scope, however, it was incorrectly classified as “intent_A” with a 99% confidence level!
intent_A had the word “rope” in one training example, and “rope” does not appear in any other intent. “soap” does not appear in any training examples, and “on” and “a” appear in many training examples.
So, presumably it was the “rope” token that caused the classifier to classify the intent as intent_A. But why 99% confidence?
I have repeated these results with other nonsense phrases where a single word appears in only one intent, and consequently that intent is predicted with 95% plus confidence.
Any suggestions on what I should do?
My pipeline: pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
- name: DIETClassifier epochs: 100 constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector epochs: 100 constrain_similarities: true
- name: FallbackClassifier threshold: 0.5 ambiguity_threshold: 0.1