OOV Token not work

Hi, There are problems with training the chatbot. If unknown words or sentences are entered the chatbot find “Intent” that are higher than 0.3. And the corresponding answers to the “Intent” with the highest value are choosed and displayed.

According to manual the component “OOV_token” can be used to recognize unknown words. The following change was made in config.yml.

pipeline:

  • name: SpacyNLP case_sensitive: false
  • name: SpacyTokenizer use_lemma: false
  • name: SpacyFeaturizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer OOV_token: oov token_pattern: (?u)\b\w+\b
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100

An “intent” was created with 4 example NLU data. But nothing has changed during the test. The answers of the highest rated “intent” are still displayed for unknown words or sentences. The found intents have a higher value than 0.3. How to solve this problem?

The OOV_token is used to represent unknown words during predictions. It doesn’t recognize them or anything similar. E.g. if you have a sentence full of unknown words, then the classifier would be able to see that there are many unknown words and made the prediction for this.

I think the problems in your case are rather a lack of training data. In my opinion it could also help to add an intent out_of_scope with messages which are out of scope so that the DIETClassifier can learn what messages shouldn’t be mapped to any “in scope” intents.

Checkout the documentation here