OOV-token for tensorflow embedding

datistiquo · October 8, 2018, 7:56pm

I added to my data the following (chatito format for better reading)

%[out_of_scope]
    OOV OOV
    OOV OOV OOV
    OOV OOV OOV OOV

as I want that for more than two unknown words I want to have an out of scope intent. I hope I understand the doc correctly to do this in this way?

My config:

pipeline:
- name: "intent_featurizer_count_vectors"
  "OOV_token": OOV # string or None
  "OOV_words": []  # list of strings
- name: "intent_classifier_tensorflow_embedding"

Now, even well trained sentences where all words are known are get classified as out_of_scope?!

But I created just for OOV tokens the out of scope intent, so I would like that always sentences with knwon words should not be classified as ot of scope - no matter if just one known word is contained?

Also adding a intent greet with

%[greet]
    hallo
    hi
    hey
    wie geht es

Now, for one OOV token I get intent greet. This token is defnitely not in training data and so OOV. Why is this getting classified as intent at all where no OOV tokens are inside?

Help?

akelad · October 10, 2018, 8:38am

what exactly do you want to happen with just one OOV token? to keep the null intent? The config you provided should work yes, though you don’t need the "OOV_words" : [] line, just the OOV_token:OOV

If you just have one oov token in the sentence, it should be getting classified as the out_of_scope intent since it’s most similar to that. Your performance might be a little inconsistent though if those two intents are the only ones you have

datistiquo · October 12, 2018, 1:53pm

Yes, keep it but afterwards I have a custom logic for single OOV tokens.

All my other intents have no OOV words in it, just the outofscope. If one OOV gets classified as outofscope that would be fine. But how is it possible that one OOV token can be classified as another intent than outofscope at all (like greet above) if all other intents have no OOV in it in training examples? Is this an expected behaviour or a bug?

akelad · October 12, 2018, 3:17pm

no it’s not expected behaviour, but i can’t tell you whether it’s a bug or not either since i don’t have any training data to base this off of. i’ve personally never experienced this behaviour

Topic		Replies	Views
How to log sentences containing oov words and explicitely mark oov words? Rasa Open Source	9	1955	December 18, 2018
Error in intent classification Rasa Open Source	10	703	April 7, 2020
Customize OOV_token in CountVectorsFeaturizer? Rasa Open Source	1	1206	October 17, 2019
Rasa CountVectorsFeaturizer Rasa Open Source	0	245	August 23, 2021
OOV Token not work Rasa Open Source	1	496	December 17, 2020

OOV-token for tensorflow embedding

Related topics