userWarning 'oov' configured but not found in Responselector training examples

I can’t figure out this warning, which I get when data validating as well when training new model: “The out of vocabulary token ‘oov’ was configured, but could not be found in any one of the ResponseSelector training examples. All unseen words will be ignored during prediction.”

In Rasa OS documentation, I can’t find anything on Responseselector needed for token ‘oov’

I’ve also looked at [Rasa demo Github](GitHub - RasaHQ/rasa-demo: Sara - the Rasa Demo Bot: An example of a contextual AI assistant built with the open source Rasa Stack], but couldn’t figure out how to make a response selector that could solve the warning.

So:

  1. What training data examples do I have to add above the ones I already have, like “- ik vind oov_niet_voor_bedoeld_ niet leuk”
  2. What ResponseSelector is to be used for this oov-token?

------- Configuration and data ----- Rasa OS : 3.1.7

data/nlu.yml:

..... (lot of intents before and after, so here only the intent with oov token. ):
- intent: Iets anders dan ik voor ben bedoeld
  examples: |
    - ik vind oov_niet_voor_bedoeld_ niet leuk
    - ik vind oov_niet_voor_bedoeld_ vies
    - ik vind oov_niet_voor_bedoeld_ niet lekker
    - ik lust geen oov_niet_voor_bedoeld_
    - houd je van oov_niet_voor_bedoeld_
    - wat is oov_niet_voor_bedoeld_
    - wie hebben v
    - wie zijn oov_niet_voor_bedoeld_
    - wat vind je van oov_niet_voor_bedoeld_
    - wat vind jij mooi oov_niet_voor_bedoeld_
    - wat vind je oov_niet_voor_bedoeld_
    - vind je oov_niet_voor_bedoeld_ goed
..... (and so on, many more examples with oov)

config:

recipe: default.v1
language: nl
policies:
- name: AugmentedMemoizationPolicy
  max_history: 5
- name: RulePolicy
  core_fallback_threshold: 0.3
  core_fallback_action_name: action_default_fallback
  enable_fallback_prediction: false
- name: TEDPolicy
  max_history: 5
  epochs: 80
  constrain_similarities: true
  model_confidence: softmax
pipeline:
  - name: WhitespaceTokenizer
    intent_tokenization_flag: true
    intent_split_symbol: +
  - name: RegexFeaturizer
    case_sensitive: false
    use_word_boundaries: true
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: word
    min_ngram: 1
    max_ngram: 3
    OOV_token: oov_niet_voor_bedoeld_
    use_shared_vocab: false
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    lowercase: true
    min_ngram: 1
    max_ngram: 4
  - name: RegexFeaturizer
    case_sensitive: false
  - name: DIETClassifier
    epochs: 80
    constrain_similarities: true
    model_confidence: softmax
    entity_recognition: true
  - name: ResponseSelector
    epochs: 80
    constrain_similarities: true
    model_confidence: softmax
    retrieval_intent: Q&A
  - name: EntitySynonymMapper
  - name: FallbackClassifier
    threshold: 0.45
    ambiguity_threshold: 0.1

Please use markdown to format your posts.

I don’t think OOV will help with your pipeline and would remove it.

In my experience, OOV doesn’t work except in some niche use case. Your pipeline has 4 featurizers (which is normal). With only one of them using OOV, the other 3 won’t factor in OOV and the end result is the OOV in the one featurizer is out weighed by the others so there is effectively no value in use of OOV.

Greg, thank you for your reply.

Could you give some more explanation?

Because I wonder:

  1. Why the other featurizers could minimalize trainingsdata like OOV does.
    As far as I understand, each token not seen during training will be replaced by the the oov-token. So, instead of multiple examples like “I don’t like school”, “I don’t like noise”, "I don’t like peanutbutter for “out of scope intent” I only have to give one example “I don’t like oov” . And Rasa will be able to separate this out of scope intent form the intent "I don’t like animal"which has examples with animalnames in it, like “I don’t like dogs”
  2. How could the other featurizers outweight the oov wordfeaturizer? Remark that Rasa demo bot uses the oov featurizer the same as I do. So why did Rasa incorperate oov in the demo bot?