Load jieba self_dictionay cost too much time

Hi,I am new to rasa. And I train a model use the config as follows

language: “zh” pipeline:

  • name: “JiebaTokenizer” dictionary_path: “userdict/”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “CountVectorsFeaturizer”
  • name: “EmbeddingIntentClassifier”

In my case ,I put about 100M txt file in the dictionary_path which contains large number of movie_name and star_name.So every time I restart the server or fetch a new model from a server by url I set in endpoints.yml, rasa will cost about 10 second to response for the next request.What is worse is that I deploy my model with docker for about 30 containers .So I will have 30 very slow request each time I train a new model . my user_dict for jieba will change very offen,because I have to add some new move_name and star_name .So I have to train and fetch a new model very offen.Anyone can help me about this.

I don’t think there is anything from the rasa side that you can do about this as you will have to retrain whenever you update the dictionary. If possible it’d be best to reduce the size of the file. Why is it so large?

Thanks for reply.I think I have to find some other solution from other side

Glad to hear it :+1: If you think your solution could help others would mind outlining it here?