Hi Guys, I am training a rasa nlu model in Chinese. The training time is very long. I think it is due to the parsing sentence. The size of training data is 600 with 10 intents and 11 labels, and it take more than 15 hours to finish. Is there any way to improve this process. We need to update the model regularly. Thanks!
You can try ‘spacy’ instead of ‘mitie’ in your model configuration file.
Really? But I think Spacy doesn’t support Chinese.
@Qianyue I have a project that build SpaCy models for Chinese language AT https://github.com/howl-anderson/Chinese_models_for_SpaCy. I hope it can solve your problem.
Can you give more details about how you train your Chinese NLU model? Shell command or scripts content would be very helpful. Usually, it don’t take much time. But if you are building a MITIE model that will take a very long time. So, if you can provide more details about how to train the model, we can locate the problem more quickly.
Thank you!! I will definitely try this method. Did you use this model to build rasa nlu model and make dialogue bot?
I am using an existing MITIE model for test. But the Rasa NLU training time is still long with the data size I described above. And we will add much more training data later.
I just followed the official guide to build the model. Use the pre-configured mitie pipeline, but replaced the tokenizer with ‘tokenizer_jieba’. The pipeline is below:
- name: “nlp_mitie” model: “data/total_word_feature_extractor_zh.dat”
- name: “tokenizer_jieba”
- name: “ner_mitie”
- name: “ner_synonyms”
- name: “intent_entity_featurizer_regex”
- name: “intent_featurizer_mitie”
- name: “intent_classifier_sklearn”
That’s weird, can you provide more details about your computer: OS, CPU, memory, hard disk type (SSD?). What’s your the version of python and RASA NLU.
Sure! I am using MacBook Pro 2017, with 3.1GHz Intel Core i5 CPU, and 8GB memory. Thanks! Do you know the average training time for Jiebe-mitie model? Thanks!
I don’t record the training time, but I remember it’s pretty short. Can you train your model on dataset of Weather (Chinese version) from https://github.com/howl-anderson/NLU_benchmark_dataset, so we can using the same dataset then find out why.
Thanks!! I just tried this dataset. The training time is 1396 seconds. I am thinking the problem might be the distribution of my training data?
It still needs more details, please add my WeChat account : here-we-meet, using IM to communicate is a better choice for this case.