Hi, im trying to understand how the SKLearnIntentClassifier works exactly. And I couldnt find a resource that explains, which features the IntentClassifier uses.
What surprised me, was the intent classification of messages with only a single token, which does not have a word-vector (and is maybe even OOV). Extra points for the one who can tell me, why there are lots of words in the vocab, which dont have vectors.
So for example a simple greeting in german is ‘Hallo’, which should be recognized as a greet-intent. Another way to say ‘Hallo’ is ‘Moin’. In contrast ‘Moin’ does not have a word-vector. If ‘Moin’ is not part of my training data for greet, the intent is missclassified, but when I add the exact word to the training data, it gets classified corretly with high confidence, even though its still an OOV-word. Other ways to say ‘Hallo’ still get classified poorly. So what features does the intent classifier rely on in this case? Is there a feature for exact matching with a training example?