Why is there local path being added to the generated intent_featurizer_count_vectors.pkl

tommetcalfe
rasa-nlu
(Libindavis) #1

Hi Communities,

I am using tensorflow and CounVector to featurize the input text.
Here is my pipeline configure:

language: "en"
pipeline:
    - name: "nlp_spacy"
    - name: "tokenizer_spacy"
    - name: "intent_featurizer_count_vectors"
      stop_words: ['how','what','hows','is','the','whats']
      min_df: 0.0
      max_df: 1.0
      min_ngram: 1
      max_ngram: 2
    - name: "intent_entity_featurizer_regex"
    - name: "ner_crf"
      BILOU_flag: true
      features: [["low",'title'],
           ["bias", "low", "title","pos",'pattern','prefix5','prefix2', 'suffix5', 'suffix3'],
           ["low", "title"]]
    - name: "ner_duckling_http"
      url: "http://localhost:8000"
      dimensions: ["time", "number", "duration", "ordinal"]
      locale: "en_US"
      timezone: "US/Pacific"
    - name: "ner_synonyms"
    - name: "intent_classifier_tensorflow_embedding"

But when I run my model in a Docker container, it would try to access my local file of /home/<my_local_path>/rasa_nlu/rasa_nlu/featurizers/count_vectors_featurizer.py, other than the related file in the docker which is /app/rasa_nlu_chatbot/rasa_nlu/rasa_nlu/featurizers/count_vectors_featurizer.py. And it leads to some strange error message like:

File "/usr/local/lib/python3.5/site-packages/sklearn/feature_extraction/text.py", line 266, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "/home/<my_local_path>/rasa_nlu/rasa_nlu/featurizers/count_vectors_featurizer.py", line 140, in _tokenizer

NameError: name 'T' is not defined

After some debugging, I found that in the generated intent_featurizer_count_vectors.pkl, there is a string of my local path /home/<my_local_path>/rasa_nlu/rasa_nlu/featurizers/count_vectors_featurizer.py.

My question is, why is the model in the docker trying to acces my local path ? Why is a local path in the generated model?

My RASA_nlu version is : “0.14.0a1”

(Libindavis) #2

Below is a snippet of the generated intent_featurizer_count_vectors.pkl, as you can see that my local path is being refered.

MethodType~T~E~TR~Th,~L^N_fill_function~T~S~T(h,~L^O_make_skel_func~T~S~Th.~L^HCodeType~T~E~TR~T(K^BK^@K^DK^DK^CCtt^@j^Ad^Ad^B|^A~C^C}^At^@j^B~H^@j^C~C^A}^B|^Bj^D|^A~C^A}

^C~H^@j^Erpt^F~H^@j^Gd^C~C^BrX~H^@j^E~H^@j^Gj^Hk^Frp~G^@f^Ad^Dd^E~D^H|^CD^@~C^A}^Cn^X~H^@j rp~G^@f^Ad^Fd^E~D^H|^CD^@~C^A}^C|^CS^@~T(~L%Override tokenizer in CountVecto rizer~T~L \b[0-9]+\b~T~L

__NUMBER__~T~L^Kvocabulary_~Th8(K^AK^@K^BK^DK^SC&g^@|^@]^^}^A|^A~H^@j^@j^Aj^B~C^@k^Fr^|^An^D~H^@j^C~Q^Bq^DS^@~T (h^^h=~L^Dkeys~Th^Xt~T~L^B.0~T~L^At~T~F~T~L~Y /home/<my_local_path>/rasa_nlu/rasa_nlu/featurizers/count_vectors_featurizer.py ~T~L

~TK~VC^B^F^A~T~L^Dself~T~E~T)t~TR~T~L5CountVectorsFeaturizer._tokenizer..

~Th8(K^AK^@K^BK^DK^SC g^@|^@]^X}^A|^A~H^@j^@k^Fr^X~H^@j^An^B|^A~Q^Bq^ DS^@~T)h^Yh^X~F~ThAhB~F~ThDhEK~\C^B^F^A~ThG~E~T)t~TR~Tt~T(~L^Bre~T~L^Csub~T~L^Gcompile~Th^G~L^Gfindall~Th^X~L^Ghasattr~Th^^h=h^Yt~T(hG~L^Dtext~Th^G~L^Ftokens~Tt~ThD~L _tokenizer~TK~JC^X^@^B^N^B^L^A

(Tom Metcalfe) #3

Hi @libindavis,

Could you try training the featurizer with no stop words and then test if that gives the same error? My best guess is that we are improperly pickling the model, which we are currently working on fixing

Let me know if that doesn’t solve it