I am using Rasa with default configurations and default pipeline and pretrained_embeddings_spacy for intent classification. For one intent I defined 30 training sentences like “Give me an example”, “Can I have an example”, “One example please”, etc.
After running the pipeline and training the svm classifier the results of the intent recognition are very poor. Even when I use an exact match from the training data “Give me an example” the probability of the intent is only 0.08 and therefore below my threshold (0.2). Note that every training sentence contains the word “example” and no other intent does, so I would expect a much higher probability.
Any ideas how the intent classification can be improved?
is it still the correct intent? how many intents do you have?
Yes the intent is the right one but the confidence is too low. There are 12 intents.
Any ideas what the problem could be, or is it normal to have such a low confidence?
is there by default any stopword removal in spacy?
this is confidence of svm classifier. It could be low due to the lack of training data
As a test, what if you change your pipeline to
supervised_embeddings and retrain?
Do you get better responses?
The confidence was a bit better with supervised_embeddings, but not much.
How easy is it to include stopwords or tf idf weighting on the word vectors? And can I output the word vectors of my sentences for debugging?
you need to hack into spacyfeaturizer, to see the word vectors. For stop words removal, if you use spacy pipeline, you 'd need to write a custom component
Hey how did you achieve this finally ?