I am using Rasa with default configurations and default pipeline and pretrained_embeddings_spacy for intent classification. For one intent I defined 30 training sentences like “Give me an example”, “Can I have an example”, “One example please”, etc.
After running the pipeline and training the svm classifier the results of the intent recognition are very poor. Even when I use an exact match from the training data “Give me an example” the probability of the intent is only 0.08 and therefore below my threshold (0.2). Note that every training sentence contains the word “example” and no other intent does, so I would expect a much higher probability.
Any ideas how the intent classification can be improved?
The confidence was a bit better with supervised_embeddings, but not much.
How easy is it to include stopwords or tf idf weighting on the word vectors? And can I output the word vectors of my sentences for debugging?
you need to hack into spacyfeaturizer, to see the word vectors. For stop words removal, if you use spacy pipeline, you 'd need to write a custom component