Slow Training Chinese NLU Model

rasa-nlu

(Qianyue Zhang) #1

Hi Guys, I am training a rasa nlu model in Chinese. The training time is very long. I think it is due to the parsing sentence. The size of training data is 600 with 10 intents and 11 labels, and it take more than 15 hours to finish. Is there any way to improve this process. We need to update the model regularly. Thanks!


(马健) #2

You can try ‘spacy’ instead of ‘mitie’ in your model configuration file.


(Akela Drissner) #3

@howlanderson you might have some insight on this as well?


(Qianyue Zhang) #4

Really? But I think Spacy doesn’t support Chinese.


(Xiaoquan Kong) #5

@Qianyue I have a project that build SpaCy models for Chinese language AT https://github.com/howl-anderson/Chinese_models_for_SpaCy. I hope it can solve your problem.


(Xiaoquan Kong) #6

@akelad OK


(Xiaoquan Kong) #7

Can you give more details about how you train your Chinese NLU model? Shell command or scripts content would be very helpful. Usually, it don’t take much time. But if you are building a MITIE model that will take a very long time. So, if you can provide more details about how to train the model, we can locate the problem more quickly.


(Qianyue Zhang) #8

Thank you!! I will definitely try this method. Did you use this model to build rasa nlu model and make dialogue bot?


(Qianyue Zhang) #9

I am using an existing MITIE model for test. But the Rasa NLU training time is still long with the data size I described above. And we will add much more training data later.

I just followed the official guide to build the model. Use the pre-configured mitie pipeline, but replaced the tokenizer with ‘tokenizer_jieba’. The pipeline is below:

language: “zh”

pipeline:

  • name: “nlp_mitie” model: “data/total_word_feature_extractor_zh.dat”
  • name: “tokenizer_jieba”
  • name: “ner_mitie”
  • name: “ner_synonyms”
  • name: “intent_entity_featurizer_regex”
  • name: “intent_featurizer_mitie”
  • name: “intent_classifier_sklearn”

(Xiaoquan Kong) #10

That’s weird, can you provide more details about your computer: OS, CPU, memory, hard disk type (SSD?). What’s your the version of python and RASA NLU.


(Qianyue Zhang) #12

Sure! I am using MacBook Pro 2017, with 3.1GHz Intel Core i5 CPU, and 8GB memory. Thanks! Do you know the average training time for Jiebe-mitie model? Thanks!


(Xiaoquan Kong) #13

I don’t record the training time, but I remember it’s pretty short. Can you train your model on dataset of Weather (Chinese version) from https://github.com/howl-anderson/NLU_benchmark_dataset, so we can using the same dataset then find out why.


(Qianyue Zhang) #14

Thanks!! I just tried this dataset. The training time is 1396 seconds. I am thinking the problem might be the distribution of my training data?


(Xiaoquan Kong) #15

It still needs more details, please add my WeChat account : here-we-meet, using IM to communicate is a better choice for this case.