How large is a typical dataset for training Rasa NLU?

benoitdemaegdt · September 17, 2018, 2:03pm

Hi all !

I’m starting using Rasa NLU for some classification tasks.

I’m wondering: how big should my training dataset be in order to achieve a pretty good classification?

I assume a lot of you already built products using Rasa NLU with more or less success. Could you give a feedback about your performances given the size of your dataset and the number of intents? I guess it would help a lot of newcomers like me.

ex: I trained Rasa NLU with 100 000 sentences for classifying 20 intents and it got 80% accurate classification on my test dataset

Thank you !

kothiyayogesh · September 17, 2018, 4:11pm

Hi,

Welcome to the community.

I am stilling building bot using RASA but I will definitely share my experience once it is ready.

Unfortunately, there is no straightforward answer to this question. It depends a lot on your intent and entities.

If your intent or entities are easily confusable. Then you definitely need more training data. Training data for each intent increases with the addition of every new intent or entities.

To get the confidence you can also evaluate after you have trained your model.

You can use Chatito to generate more training data.

All the best.

benoitdemaegdt · September 17, 2018, 5:26pm

Hi,

Indeed, it makes sense that the more intents I have and the blurrier the border is between two intents, the more I will need data to make Rasa able to find a pattern for each intent and able to classify sentences. Thank you for pointing that out.

May I still ask you how big is your training dataset ?

Thanks

akelad · September 18, 2018, 4:07pm

Is 80% not pretty good? take a look at the nlu evaluation script and see which intents are getting confused to optimize performance: https://rasa.com/docs/core/evaluation/

Topic		Replies	Views
Nlu intent training with large dataset Rasa Open Source	15	1430	October 10, 2022
Using API Train of Rasa NLU by big data set? Rasa Open Source	1	628	January 28, 2020
Rasa train taking lot of time Rasa Open Source	22	4918	July 6, 2021
Trouble with intent classification Getting Started with Rasa	1	275	October 15, 2019
Rasa NLU - Understanding Training Data Rasa Open Source	4	1609	March 24, 2020

How large is a typical dataset for training Rasa NLU?

Related topics