Rasa BERT, load weights from cache

almois · April 24, 2020, 11:14am

Hi everyone,

Due to some restrictions, during the BERT training I am not allowed to load the weights etc. from external sources. So, I pre-loaded them and stored in the cache from where the Rasa can read i.e. access them. Now, when I am starting the model locally it is working fine (and loads the weights etc for the model from cache). However, when I am doing the same thing inside the docker container I am getting the following error:

INFO transformers.modeling_tf_utils - loading weights file [https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5](https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.amazonaws.com_models.huggingface.co_bert_bert-2Dbase-2Dmultilingual-2Duncased-2Dtf-5Fmodel.h5&d=DwQFAw&c=eIGjsITfXP_y-DLLX0uEHXJvU8nOHrUK8IrwNKOtkVU&r=8Sh_uRlXH-MR-JleoFcjs3fOU3krvhtAtBPw5MwlNpc&m=wbAACKHoUduiwjeLs-7QP88f9t6CDS-I7GDDWhSrBjA&s=yIKz-GdmrR27-hp7JnFOK8P9gEzt84jwMTRdtapY6f4&e=) from cache at None
ERROR    rasa.core.agent  - Could not load model due to stat: path should be string, bytes, os.PathLike or integer, not NoneType.

This only happens with weights so far. so the vocab and config files are loaded without any problems.

Does anyone had the same issue before? Or may be some one has any idea to fix it?

Appreciate any help!

akelad · April 27, 2020, 11:13am

Hey @almois are you setting the location of the weights through an environment envariable or how?

nboussarsar · April 28, 2020, 8:31am

Hi @akelad, thanks for responding. I am supporting @almois on this topic.

We are setting the weights location through the config.yml file

pipeline:
        - name: HFTransformersNLP
          model_name: "bert"
          model_weights: "bert-base-multilingual-uncased"
          cache_dir: lfs

To develop further the issue described by @almois, we are running Rasa as a docker container and it looks like it is a “machine/environment” specific issue as the error is thrown in one environment but not in another one. As per our understanding, the error is the following: rasa.core.agent is trying to load a file from a path which is set to None. Please see the logs below.

DEBUG    rasa.nlu.utils.hugging_face.hf_transformers  - Loading Tokenizer and Model for bert
INFO     transformers.tokenization_utils - loading file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-vocab.txt from cache at lfs/ bb773818882b0524dc53a1b31a2cc95bc489f000e7e19773ba07846011a6c711.535306b226c42cebebbc0dabc83b92ab11260e9919e21e2ab0beb301f267b4c7
INFO     transformers.configuration_utils - loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-config.json from cache at lfs/ 33b56ce0f312e47e4d77a57791a4fc6233ae4a560dd2bdd186107058294e58ab.c7892120c5a9b21e515abc904e398dbabddf9510b122f659063cbf361fe16868
.
.
.
INFO     transformers.modeling_tf_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual-uncased-tf_model.h5 from cache at None
ERROR    rasa.core.agent  - Could not load model due to stat: path should be string, bytes, os.PathLike or integer, not NoneType.

As you can see in the logs, transformers.tokenization_utils and transformers.configuration_utils were able to locate the folder lfs which contains the bert cached files. On the other hand, transformers.modeling_tf_utils was not able to as the folder was set to None.

Please let me know if you need to have a look on the Dockerfile.

nboussarsar · April 28, 2020, 9:00pm

SOLVED

When we run the docker image, rasa needs to download files (in addition to the ones already cached) and if there is no internet access, then it will thrown the None folder.

argideritzalpea · May 4, 2020, 1:47am

How does one resolve this issue when you train a Bert model in one docker environment, and then curl that resulting model to Rasa X in a different docker container?

localk · October 30, 2020, 7:44am

Then there is no way to use HFTransformerNLP without a network?

akelad · October 30, 2020, 1:16pm

If you specify a cache_dir, you should only need to download it once. Components

evilc3 · May 30, 2021, 5:07pm

@akelad , I am facing a similar issue when ever I sepcify a cache_dir then it downloads the file but when I turn off my internet i get connection error sayinh cant load model weights not cached shouldn’t it load from the cache_dir. Right ?

Using rasa 2.6 I am not using languagetokebizer and hftransformernlp as d docs says its deprecated.is something wrong with my config maybe ?

Here is my config file.

Configuration for Rasa NLU.

pipeline:

-  name:
sentiment_ex.SentimentEmotionAnalyzerNLTK
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
# - name: CountVectorsFeaturizer
# Analyzer to use, either 'word', 'char', or 'char_wb'
# "analyzer": "word"
# Set the lower and upper boundaries for the n-grams
# "min_ngram": 1
# "max_ngram": 3
# Set the out-of-vocabulary token
# "OOV_token": "_oov_"
# Whether to use a shared vocab
# "use_shared_vocab": False
- name: LanguageModelFeaturizer
# Name of the language model to use
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "rasa/LaBSE"
cache_dir: /dir_name/model_name
 - name: RegexEntityExtractor
    case_sensitive: False
     use_lookup_tables: True 
  - name: "DucklingEntityExtractor"
 # url of the running duckling server
 url: "http://localhost:8000"
 #   # dimensions to extract
  dimensions: ["time","email"]
  timeout : 3
 - name: DIETClassifier
  epochs: 150
  constrain_similarities: true
  # model_confidence: linear_norm
  - name: EntitySynonymMapper
  - name: ResponseSelector
  epochs: 100
  retrieval_intent: faq
  # scale_loss: false
  - name: ResponseSelector
  epochs: 100
  retrieval_intent: chitchat
  # scale_loss: false
  - name: ResponseSelector
  epochs: 100
   retrieval_intent: inform
  scale_loss: false
 - name: FallbackClassifier
  threshold: 0.70
  ambiguity_threshold: 0.1

I just want to train d model without using internet . and during debggingit y is it making api request to hugging face ?

Topic		Replies	Views
Using BERT with RASA Rasa Open Source	10	7140	September 9, 2020
Configuring Rasa with pre-trained Tensorflow model Rasa Open Source	8	2459	April 9, 2021
Training fails when using HFTransformersNLP Rasa X [Deprecated] Rasa X Community Edition	8	1314	November 30, 2020
Rasa X Training Error [Deprecated] Feedback on Rasa X	3	1693	July 7, 2020
Loading Bert language weights offline Tutorials, Resources & Videos community	1	1275	November 17, 2020

Rasa BERT, load weights from cache

Related topics