Nlu intent training with large dataset

Hello, I am training a chemical experimental QA chatbot, which can chat and answer questions related to chemical experiments. I am currently using a custom action to read the data from mongodb, so that nlu is trained as an intent and can run successfully, but I After increasing the amount of data, I found that rasa nlu cannot train the intent, is there any other way to make it run successfully, thank you

(training failed error message)

First of all tell me what types of data you have and Give me proper Details.

@Rosscoffman

This is part of my data, question and a corresponding answer, most of them are like this, I organize the question set into a csv file and upload it to mongodb compass

image

There are 777243 records in total

@nik202 HI,Sorry to trouble you

After the data set becomes larger, rasa nlu cannot train the intent normally.

Is there any way to improve it?

Hi, are you adding new classes of intents?

I only set the question in the database as an intent (question)

@dan246 ok no worries. When you say after the data set becomes larger what you actually doing that it increase the size of data set and what you mean by dataset?

Is that you mean the Rasa NLU intents with the set of questions you are adding or anything else?

In rasa there is no database ( so what to know what you meant by this)

Thanks.

@nik202

I use custom action to let rasa read the QA data in mongodb compass, convert the questions in it into intents and train nlu, but after I increase the number of questions, the nlu training error occurs

@dan246 it should not show the error, but you need to check the resources required for running the Rasa. Check Rasa documentation.

Secondly, check are you able to convert the data into nlu as set of questions.

Third, try delete the previous trained model and re run train and talk to your bot.

Fourth, try upload in a small batchs so that you will able to figure out after what amount of data it’s breaking.

Otherwise, try write the code to fetch the data from CSV may be MongoDB can have some issue.

Do this and let me know.

@nik202 Hi, I tried a small part of the data and it works, but when I put all the data in, it doesn’t work, after it shows this (image), then my computer has a Blue Screen of Death

@dan246 as expected your computer not have resources to meet the current data handling, so increase the resources and see the difference.

@dan246 try more small data again.

@nik202

770k data → Blue Screen of Death

260k data → Function call stack error

10k data → run successfully

Is it because of the performance of my computer that I can’t train successfully?

@dan246 try create new intent and update the code for questions i.e questions1 and try upload after 10k i.e 20-30k

Meanwhile let me explore, if I got some time.

@nik202 ok i will try it :grin:

Do I need to show you actions.py now?

@dan246 no just try this process and update me, just break into small sets and code for individual

2 Likes