Customize OOV_token in CountVectorsFeaturizer?

piyush29programmer · October 11, 2019, 5:13pm

Hi,

As per rasa documentation for OOV_token

the training is performed on limited vocabulary data, it cannot be guaranteed that during prediction an algorithm will not encounter an unknown word (a word that was not seen during training). In order to teach an algorithm on how to treat unknown words, some words in training data can be substituted by the generic word OOV_token . In this case, during prediction, all unknown words will be treated as this generic word OOV_token .

I have a similar situation where the user asked the wrong intention question(Not related to which bot is made for) for which bot recognize the trained intent.

example

Food restaurant bot. Negative scenario

NLU is trained for - Tell me about any nearby Indian restaurant?
Answer - The nearest Indian restaurant is at church street and 2km from your location.


User - Tell me more about any nearby Indian community centers?   (this is out of scope question but have similar words.)

Bot reply - The nearest Indian restaurant is at church street and 2km from your location.

We try to add this question to outofscope but like this, there can be many scenarios where user can ask outofscope question which has similar intent in the trained model.

After going through OOV_token it might be useful if i add community center in OOV_token and as outofscope intent then if the user asks the same question then it will fall back in outofscope . But the issue is the same how many keywords should i add ???

Is there any option where i can add a list of words and when rasa nlu finds an untrained word then if looks in the list, if doesn’t exist then choose 2 fallback policy or didn’t pick the trained intent.???

akelad · October 17, 2019, 1:36pm

@piyush29programmer you can also add sentence examples with the oov token directly to your training data. Then you don’t need to list keywords.

So you could have e.g. “Tell me more about any nearby Indian oov oov?”

Topic		Replies	Views
Rasa CountVectorsFeaturizer Rasa Open Source	0	245	August 23, 2021
OOV token not found in NLU Rasa Open Source	3	1291	May 6, 2020
Use of Out of Vocabulary - OOV Rasa Open Source	9	3431	December 22, 2021
OOV Token not work Rasa Open Source	1	493	December 17, 2020
OOV-token for tensorflow embedding Rasa Open Source	3	1600	October 12, 2018

Customize OOV_token in CountVectorsFeaturizer?

Related topics