OOV-token for tensorflow embedding

I added to my data the following (chatito format for better reading)


as I want that for more than two unknown words I want to have an out of scope intent. I hope I understand the doc correctly to do this in this way?

My config:

- name: "intent_featurizer_count_vectors"
  "OOV_token": OOV # string or None
  "OOV_words": []  # list of strings
- name: "intent_classifier_tensorflow_embedding"

Now, even well trained sentences where all words are known are get classified as out_of_scope?!

But I created just for OOV tokens the out of scope intent, so I would like that always sentences with knwon words should not be classified as ot of scope - no matter if just one known word is contained?

Also adding a intent greet with

    wie geht es   

Now, for one OOV token I get intent greet. This token is defnitely not in training data and so OOV. Why is this getting classified as intent at all where no OOV tokens are inside?


what exactly do you want to happen with just one OOV token? to keep the null intent? The config you provided should work yes, though you don’t need the "OOV_words" : [] line, just the OOV_token:OOV

If you just have one oov token in the sentence, it should be getting classified as the out_of_scope intent since it’s most similar to that. Your performance might be a little inconsistent though if those two intents are the only ones you have

Yes, keep it but afterwards I have a custom logic for single OOV tokens.

All my other intents have no OOV words in it, just the outofscope. If one OOV gets classified as outofscope that would be fine. But how is it possible that one OOV token can be classified as another intent than outofscope at all (like greet above) if all other intents have no OOV in it in training examples? Is this an expected behaviour or a bug?

no it’s not expected behaviour, but i can’t tell you whether it’s a bug or not either since i don’t have any training data to base this off of. i’ve personally never experienced this behaviour