Problem in nlu intent classification

devesh · June 10, 2020, 8:58am

I have asked the question to my rasa bot like: “What is the number of people in a county”…The exact same question is not present in any of my nlu example data and it classifies this utterance with some random intent. But I have an intent having example like: “number of schools”. So, instead of classifying this utterance with any random intent, it should classify this utterance with number of schools because it is the most similar nlu example present. I wanted to say that should I customize my nlu pipeline and the epochs I am using to do this nlu training better. Thanks

chkoss · June 11, 2020, 9:28am

What pipeline are you currently using?

What you perceive as “most similar” is not necessarily the same as what the machine learning pipeline uses to classify the intent.

The usual approach for improving your bot would be to add more training data. Especially for messages that are currently classified wrongly, add them as nlu examples to the correct intent.

Does “What is the number of people in a county” actually belong to the intent with “number of schools”? If yes, add it as an nlu example, then your bot will be better in the future. If no, then it’s okay that the bot classifies it as something random, there is no reason why it has to be classified as the “number of schools” intent.

devesh · June 11, 2020, 9:57am

Thanks @chkoss for the early reply. I am currently using this pipeline:

language: en pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 150
name: EntitySynonymMapper
name: ResponseSelector epochs: 150 I am making a bot which uses most terms related to a specific industry. Should I try some different pipeline to train the bot again and compare the results. If yes then what are the changes you suggest me to do. Otherwise, I think the only way to increase accuracy of intent classification is to increase number of examples. Thanks again

chkoss · June 11, 2020, 11:12am

Yes, increasing the number of examples is an important step for increasing accuracy. But adjusting the pipeline can sometimes help, too.

You could try using ConvertRTTokenizer followed by ConvertRTFeaturizer instead of the WhitespaceTokenizer. This featurizer has pre-trained word embeddings, which means it comes with some knowledge about e.g. which English words/phrases are similar to which other English words. See this docs page for more explanation. It might not help in your case if the terms are very industry-specific, but probably worth a try.

Here is some advice on how to compare the performance of different pipelines.

devesh · June 11, 2020, 2:35pm

Thanks @chkoss , I tried to create new virtual environment and put the command pip install rasa[convert] there to use convertRTTokenizer thing. But I am getting error in that. I am using Windows 10. Here is the screenshot of the error.

chkoss · June 11, 2020, 2:59pm

Ah, if you’re on Windows, unfortunately you can’t use the ConvertRTTokenizer. It relies on tensorflow_text, which is not available for Windows.

You could still make it work using e.g. WSL for Windows or by using Rasa in docker, but it might not be worth the effort.

Topic		Replies	Views
Is it a problem, if i have more nlu examples Rasa Open Source	2	410	October 5, 2021
Strange Behavior in RASA NLU Rasa Open Source	4	747	August 22, 2020
Choosing NLU pipeline Rasa Open Source	6	1326	December 16, 2019
Intentclassification very unreliable, what pipeline components should I use Rasa Open Source	1	337	May 3, 2022
Improve Intent Classification Rasa Open Source	2	1108	June 9, 2023

Problem in nlu intent classification

Related topics