Rasa nlu train with a large dataset is stuck

KamyarGhajar · March 30, 2020, 7:06am

I am training the NLU with a large 3 GB markdown NLU train format file. It has been 12 hours since I started the training phase, but it is stuck and filled up the resources on the machine that does not accept new ssh access. Is there consideration I should remember when training on such large files, or are there options on rasa train nlu so it would use multi-core CPU or the GPU?

stephens · March 31, 2020, 1:29am

Hi Kamyar,

That’s a huge dataset. I’m curious how many intents and utterances are in your dataset and how you generated it.

Either way, that dataset is too big.

Greg

KamyarGhajar · March 31, 2020, 8:36am

Hi Greg and thank you for the response, Actually I am using Rasa NLU for sequence tagging and intent extraction for address geocoding and the intents are the place types and for each map place in a country I have generated 2 or 3 sentences (similar sentence types for each place) that is an address with about 20 intents likes restaurant, shopping center, shop, street, etc. and with sentence tags such as neighborhood, city, etc. So the size of the .md file has became about 3GB. I have set the pipeline to use tensorflow embedings too. So, I was wondering how can I config the train phase to reduce the epochs from 300 to something like 10 and maybe use GPU for training phase, so it may help the system not to stick loading the data or so. I don’t know what is the problem with training NLU with a large data? Btw I haven’t used the lookup tables yet, as I need the model to learn to handle misspellings too, so the lookups would be a bit large but the sentences would reduce to that small 2-3 types.

Ghostvv · March 31, 2020, 10:19pm

so for small number of epochs it trains fine? what version are you using?

KamyarGhajar · April 1, 2020, 7:16am

Yeah, actually I have set the tf epochs to 10 now, but initial loading of a 3GB file makes it stuck, the Rasa version is 1.8

Ghostvv · April 1, 2020, 8:47am

so it cannot start train?

KamyarGhajar · April 1, 2020, 10:35am

It seems it gets stuck in the data loading phase. Actually the problem is that I give it so much time in a linux screen shell on a machine that I have a ssh connection to. When I start the train after a while I lose the ssh connection and I have to restart the machine. When I recheck the screen logs, I see no log at all, it seems the train did not even start and the system got stuck so we needed to reboot it.

Ghostvv · April 1, 2020, 12:44pm

well… 3 Gb is a lot of text data, it might run out of memory, while loading it or featurizing it

KamyarGhajar · April 1, 2020, 9:08pm

The machine has 32GB of ram and fast 8 CPU cores and is not a weak system at all with or without a GPU. So, do you have a suggestion for me about working with such large data for training an NLU model for sequence labeling and intent extraction with rasa?

Ghostvv · April 2, 2020, 10:05am

for such a huge amount of data, it should be loaded in batches into the memory, meaning the whole Rasa NLU pipeline need to be updated.

Ghostvv · April 2, 2020, 10:06am

What is this data, is it generated? I would try with smaller amount but real data first

KamyarGhajar · April 2, 2020, 5:35pm

Is it developed or I should wait or maybe contribute?

KamyarGhajar · April 2, 2020, 5:37pm

Yeah, the data is generated for each place with defined tagged sentences in Rasa NLU acceptable format. With an about 100MB data of ours it works fine but with a large file it does not work.

Ghostvv · April 3, 2020, 1:36pm

we don’t currently work on online loading of data. I’d recommend reduce the amount of generated data to the one that fits in the memory of your machine

KamyarGhajar · April 3, 2020, 2:48pm

Do you know the estimated machine config for about 3GB of data with about half a dozen of tags and 15-20 intents? Or maybe 1GB or so? Should I test each of them?

abhishakskilrock · April 3, 2020, 4:27pm

Hi @KamyarGhajar

I have trained my model on around 5 GB of markdown data with around 500 different intents. I did it when rasa-nlu and rasa-core were separate i.e. on rasa-nlu 0.15 version, so I am sharing with you my experience how I did it with that amount of data:

First, segregate this data into separate files in the size which your system memory can bear during training and train it sequentially on these files(don’t train parallelly as that takes the same memory as with complete data files and keep all model into a single directory.

Now load every model from model directory and make your prediction.
P.S.

You may have to monkey patch rasa model loading code. I did it on 0.15 version but I am not sure if you have to do it in the latest rasa version.
You prediction accuracy will be low as compared to single model accuracy, I don’t know why but this accuracy will also be good, just modify the hyperparameter to the best accuracy.

KamyarGhajar · April 3, 2020, 6:13pm

Thank you very much for your most informative response Abhishak. I will try your way too then. The major problem here is you say I should use a previous version of nlu. Are you sure it won’t work in the new versions?

abhishakskilrock · April 3, 2020, 6:32pm

No I never said it won’t work on latest version I said I used on previous version. I didn’t tested it on latest version but I think it will work on latest version also as rasa developers are great coders they have modified many things but I think you may have to monkey patch load_model function according to your need. If you need any help just ping me anytime, I’ll be happy to help.
Best of luck for your work.
Enjoy & Happy Coding

KamyarGhajar · April 3, 2020, 7:12pm

Oh, I see. Thanks again Abhishak.

abhishakskilrock · April 3, 2020, 7:24pm

Happy to help @KamyarGhajar

Topic		Replies	Views
[Rasa NLU] how to train the data with GPU Rasa Open Source	5	2015	June 12, 2019
NLU training taking lot of time Rasa Open Source	12	1358	September 6, 2019
Rasa Training not completing with large NLU data [Deprecated] Rasa X Community Edition	3	726	March 25, 2022
Training Rasa NLU model on AWS EC2 p2.xlarge Instance Rasa Open Source	10	996	November 18, 2020
How does rasa train on large data Rasa Open Source	2	374	September 13, 2022

Rasa nlu train with a large dataset is stuck

Related topics