Thanks for the logs. It seems like the language model is loading fine. I can test it locally with your config and the language model and everything runs. Did you start noticing the problem as the amount of your data increased? What happens if you try the same thing with a small dataset?
It is fine with small dataset. I used chatitio, and as the output of chatito, it has total 11M records, and i take about 70% of 11M. It had created a json file, i converted it to md file and then run rasa train.
So, i think i got issue with my pipeline … please help to review my config and comment the pipeline, if it is good for my language - Vietnamese then I will try to increase my memory.
I’m not familiar with Vietnamese, so I can’t comment on the features for the CRFEntityExtractor (you’d probably know better than me whether that makes sense), but using a vietnamese-specific language model for tokenization/featurization is a good start. Note that Rasa 1.8 also introduces some other options for language models.
Glad you got it to work. I’d recommend taking a look at your options here (you’ll have to check what works for Vietnamese) and comparing pipelines using different components and see what gives you the best results. This will depend on your data, so the best is just to experiment and see what works.