I have asked the question to my rasa bot like: “What is the number of people in a county”…The exact same question is not present in any of my nlu example data and it classifies this utterance with some random intent. But I have an intent having example like: “number of schools”. So, instead of classifying this utterance with any random intent, it should classify this utterance with number of schools because it is the most similar nlu example present. I wanted to say that should I customize my nlu pipeline and the epochs I am using to do this nlu training better. Thanks
What pipeline are you currently using?
What you perceive as “most similar” is not necessarily the same as what the machine learning pipeline uses to classify the intent.
The usual approach for improving your bot would be to add more training data. Especially for messages that are currently classified wrongly, add them as nlu examples to the correct intent.
Does “What is the number of people in a county” actually belong to the intent with “number of schools”? If yes, add it as an nlu example, then your bot will be better in the future. If no, then it’s okay that the bot classifies it as something random, there is no reason why it has to be classified as the “number of schools” intent.
Thanks @chkoss for the early reply. I am currently using this pipeline:
language: en pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
- name: DIETClassifier epochs: 150
- name: EntitySynonymMapper
- name: ResponseSelector epochs: 150 I am making a bot which uses most terms related to a specific industry. Should I try some different pipeline to train the bot again and compare the results. If yes then what are the changes you suggest me to do. Otherwise, I think the only way to increase accuracy of intent classification is to increase number of examples. Thanks again
Yes, increasing the number of examples is an important step for increasing accuracy. But adjusting the pipeline can sometimes help, too.
You could try using
ConvertRTTokenizer followed by
ConvertRTFeaturizer instead of the
WhitespaceTokenizer. This featurizer has pre-trained word embeddings, which means it comes with some knowledge about e.g. which English words/phrases are similar to which other English words. See this docs page for more explanation. It might not help in your case if the terms are very industry-specific, but probably worth a try.
Here is some advice on how to compare the performance of different pipelines.
Thanks @chkoss , I tried to create new virtual environment and put the command pip install rasa[convert] there to use convertRTTokenizer thing. But I am getting error in that. I am using Windows 10. Here is the screenshot of the error.