I can’t figure out this warning, which I get when data validating as well when training new model: “The out of vocabulary token ‘oov’ was configured, but could not be found in any one of the ResponseSelector training examples. All unseen words will be ignored during prediction.”
In Rasa OS documentation, I can’t find anything on Responseselector needed for token ‘oov’
I’ve also looked at [Rasa demo Github](GitHub - RasaHQ/rasa-demo: Sara - the Rasa Demo Bot: An example of a contextual AI assistant built with the open source Rasa Stack], but couldn’t figure out how to make a response selector that could solve the warning.
So:
- What training data examples do I have to add above the ones I already have, like “- ik vind oov_niet_voor_bedoeld_ niet leuk”
- What ResponseSelector is to be used for this oov-token?
------- Configuration and data ----- Rasa OS : 3.1.7
data/nlu.yml:
..... (lot of intents before and after, so here only the intent with oov token. ):
- intent: Iets anders dan ik voor ben bedoeld
examples: |
- ik vind oov_niet_voor_bedoeld_ niet leuk
- ik vind oov_niet_voor_bedoeld_ vies
- ik vind oov_niet_voor_bedoeld_ niet lekker
- ik lust geen oov_niet_voor_bedoeld_
- houd je van oov_niet_voor_bedoeld_
- wat is oov_niet_voor_bedoeld_
- wie hebben v
- wie zijn oov_niet_voor_bedoeld_
- wat vind je van oov_niet_voor_bedoeld_
- wat vind jij mooi oov_niet_voor_bedoeld_
- wat vind je oov_niet_voor_bedoeld_
- vind je oov_niet_voor_bedoeld_ goed
..... (and so on, many more examples with oov)
config:
recipe: default.v1
language: nl
policies:
- name: AugmentedMemoizationPolicy
max_history: 5
- name: RulePolicy
core_fallback_threshold: 0.3
core_fallback_action_name: action_default_fallback
enable_fallback_prediction: false
- name: TEDPolicy
max_history: 5
epochs: 80
constrain_similarities: true
model_confidence: softmax
pipeline:
- name: WhitespaceTokenizer
intent_tokenization_flag: true
intent_split_symbol: +
- name: RegexFeaturizer
case_sensitive: false
use_word_boundaries: true
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
analyzer: word
min_ngram: 1
max_ngram: 3
OOV_token: oov_niet_voor_bedoeld_
use_shared_vocab: false
- name: CountVectorsFeaturizer
analyzer: char_wb
lowercase: true
min_ngram: 1
max_ngram: 4
- name: RegexFeaturizer
case_sensitive: false
- name: DIETClassifier
epochs: 80
constrain_similarities: true
model_confidence: softmax
entity_recognition: true
- name: ResponseSelector
epochs: 80
constrain_similarities: true
model_confidence: softmax
retrieval_intent: Q&A
- name: EntitySynonymMapper
- name: FallbackClassifier
threshold: 0.45
ambiguity_threshold: 0.1