How to log sentences containing oov words and explicitely mark oov words?

deW1 · October 27, 2018, 10:10pm

Hey,

how would I go about logging oov words and the sentences they appear in? From what I can tell NLU doesn’t explicitely mark them.

Br Dan

akelad · October 31, 2018, 9:42am

I guess you’re using the tensorflow pipeline?

you have to add this to your config file:

- name: "intent_featurizer_count_vectors"
  OOV_token: oov

And then add a few sentences to your training data containing this oov token

Abir · October 31, 2018, 1:43pm

By saying oov token do you mean i have to specify some unknown words under the entity name “oov” .

for example

“what’s my credit card number ?”

In the above input say i want the NLU to tag “credit” as oov , so do i have to set “credit” as an entity under the name “oov” ?

please clarify

datistiquo · October 31, 2018, 5:07pm

Look maybe like I do:

and tell me, if you get same behaviour such that using OOV tokens lead to unexpected behaviour?

deW1 · October 31, 2018, 8:52pm

Sorry for being not clear enough.

From your answer I take it that I’d do something like:

## intent:mood_okay
- i am fine oov

Message: i am fine asckjnascjknsajcn
DEBUG:rasa_core.tracker_store:Recreating tracker for id 'xxx'
DEBUG:rasa_core.processor:Received user message 'i am fine asckjnascjknsajcn' with intent '{'name': 'mood_okay', 'confidence': 0.9735773801803589}' and entities '[]'

However, when I do that how do I recognize the oov word so that I can log it?

For example to get back:

DEBUG:rasa_core.processor:Received user message 'i am fine asckjnascjknsajcn' with intent '{'name': 'mood_okay', 'confidence': 0.9735773801803589}' and entities '[]', OOV_strings '["asckjnascjknsajcn"]'

Then I can take the whole sentence and the OOV_strings and save them in order to be able to add them later in case they made sense.

akelad · November 2, 2018, 2:57pm

@Abir no you don’t need to label any entity, just add sentences like “what’s my oov card number” or “what’s my credit card oov”

akelad · November 2, 2018, 2:58pm

@deW1 yes that’s how you add them. but the words don’t get logged that are found as oov atm. You can probably implement that in a custom way

Abir · November 3, 2018, 7:09am

thank you

datistiquo · November 13, 2018, 3:49pm

@Abir. Have you tried it? What are your experience (compared to my post above)?

Abir · December 18, 2018, 7:11am

What i felt is using OOV can sometimes become a wildfire . If the % of data with OOV token is high , it’s more likely the nlu will tag every user input as out of bound . OOV can be used to a certain extent. I prefer creating new labels/intents to classify all garbage data. It worked out for me.

Topic		Replies	Views
Use of Out of Vocabulary - OOV Rasa Open Source	9	3443	December 22, 2021
OOV-token for tensorflow embedding Rasa Open Source	3	1604	October 12, 2018
Rasa CountVectorsFeaturizer Rasa Open Source	0	245	August 23, 2021
How to use OOV to extract any person name? Rasa Open Source	7	1093	September 8, 2020
Error in intent classification Rasa Open Source	10	703	April 7, 2020

How to log sentences containing oov words and explicitely mark oov words?

Related topics