I’m trying to train an nlu file with data around 47K examples. But i can not, i dont see any error on rasa-x, rasa-worker. it takes around 30 mins and finished without trained models.
Please advise this? it caused by the too much nlu data? or anything else?
Hi @nhha1602, are you training this in the rasa x UI? I doubt the problem is too much data. Can you see your data, configs & domain in the UI and do they all look correct?
Hi,
I used rasa for Vietnamese
My configure as below.
I tried to manually training and found it is out of memory (my server has 32G).
It was killed when start to load: CRFEntityExtractor
Thanks for the logs. It seems like the language model is loading fine. I can test it locally with your config and the language model and everything runs. Did you start noticing the problem as the amount of your data increased? What happens if you try the same thing with a small dataset?
It is fine with small dataset. I used chatitio, and as the output of chatito, it has total 11M records, and i take about 70% of 11M. It had created a json file, i converted it to md file and then run rasa train.
So, i think i got issue with my pipeline … please help to review my config and comment the pipeline, if it is good for my language - Vietnamese then I will try to increase my memory.
I’m not familiar with Vietnamese, so I can’t comment on the features for the CRFEntityExtractor (you’d probably know better than me whether that makes sense), but using a vietnamese-specific language model for tokenization/featurization is a good start. Note that Rasa 1.8 also introduces some other options for language models.
Glad you got it to work. I’d recommend taking a look at your options here (you’ll have to check what works for Vietnamese) and comparing pipelines using different components and see what gives you the best results. This will depend on your data, so the best is just to experiment and see what works.