Hey,
how would I go about logging oov words and the sentences they appear in? From what I can tell NLU doesn’t explicitely mark them.
Br Dan
Hey,
how would I go about logging oov words and the sentences they appear in? From what I can tell NLU doesn’t explicitely mark them.
Br Dan
I guess you’re using the tensorflow pipeline?
you have to add this to your config file:
- name: "intent_featurizer_count_vectors"
OOV_token: oov
And then add a few sentences to your training data containing this oov token
By saying oov token do you mean i have to specify some unknown words under the entity name “oov” .
for example
“what’s my credit card number ?”
In the above input say i want the NLU to tag “credit” as oov , so do i have to set “credit” as an entity under the name “oov” ?
please clarify
Look maybe like I do:
and tell me, if you get same behaviour such that using OOV tokens lead to unexpected behaviour?
Sorry for being not clear enough.
From your answer I take it that I’d do something like:
## intent:mood_okay
- i am fine oov
Message: i am fine asckjnascjknsajcn
DEBUG:rasa_core.tracker_store:Recreating tracker for id 'xxx'
DEBUG:rasa_core.processor:Received user message 'i am fine asckjnascjknsajcn' with intent '{'name': 'mood_okay', 'confidence': 0.9735773801803589}' and entities '[]'
However, when I do that how do I recognize the oov word so that I can log it?
For example to get back:
DEBUG:rasa_core.processor:Received user message 'i am fine asckjnascjknsajcn' with intent '{'name': 'mood_okay', 'confidence': 0.9735773801803589}' and entities '[]', OOV_strings '["asckjnascjknsajcn"]'
Then I can take the whole sentence and the OOV_strings and save them in order to be able to add them later in case they made sense.
@Abir no you don’t need to label any entity, just add sentences like “what’s my oov card number” or “what’s my credit card oov”
@deW1 yes that’s how you add them. but the words don’t get logged that are found as oov atm. You can probably implement that in a custom way
thank you
@Abir. Have you tried it? What are your experience (compared to my post above)?
What i felt is using OOV can sometimes become a wildfire . If the % of data with OOV token is high , it’s more likely the nlu will tag every user input as out of bound . OOV can be used to a certain extent. I prefer creating new labels/intents to classify all garbage data. It worked out for me.