Loading Bert language weights offline


When we run “rasa train” in our server, we find the below error.

“OSError: Model name ‘bert-base-uncased’ was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed ‘bert-base-uncased’ was a path, a model identifier, or url to a directory containing vocabulary files named [‘vocab.txt’] but couldn’t find such vocabulary files at this path or url.”

Previously we got “https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5 from cache during training.” error and we downloaded the weight files in our server. We then tried running this offline and also tried to modify the location path in the “class PreTrainedTokenizer(object)” of “tokenization_utils.py” file. I have following questions. can some one help, pl?

  1. what to do (step wise) to run the “bert-base-uncased” offline in our server

  2. Is it a good practice?


1 Like

any news on it?