Model Loading Error and Cache Directory Problems - Issues Upgrading Rasa from 3.4.0 to 3.6.2

Our team is experiencing issues updating our Open source Rasa Version from 3.4.0 to 3.6.2. After successfully upgrading to the versions below for training we run into following error when loading the model from cache in deployment. The lfs files are moved into the container after training, which worked with our rasa 3.4.0 setup. It seems like there is an issue with loading the files now, which is causing rasa to try and access the model directly from huggingface.

Error

  Extracted model to '/tmp/tmpnw0t3his'. 
 \u001b[python-3.10.14/lib/python3.10/site-packages/rasa/shared/core/slot_mappings.py:224: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more. 
 rasa.shared.utils.io.raise_warning( 
  Node 'nlu_message_converter' loading 'NLUMessageConverter.load' and kwargs: '{}'. 
  Node 'run_WhitespaceTokenizer0' loading 'WhitespaceTokenizer.load' and kwargs: '{}'. 
  Node 'run_LanguageModelFeaturizer1' loading 'LanguageModelFeaturizer.load' and kwargs: '{}'. 
 There was a problem when trying to write in your cache folder (/home/containeruser/.cache/huggingface/hub). You should set the environment variable TRANSFORMERS_CACHE to a writable directory. 
  Loading Tokenizer and Model for bert 
  python-3.10.14/lib/python3.10/site-packages/huggingface_hub/file_download.py:797: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 
 warnings.warn( 
  Starting new HTTPS connection (1): huggingface.co:443 
  Starting new HTTPS connection (2): huggingface.co:443 
  Starting new HTTPS connection (3): huggingface.co:443 
  Starting new HTTPS connection (4): huggingface.co:443 
  Could not load model due to Error initializing graph component for node run_LanguageModelFeaturizer1.. 
 Traceback (most recent call last): 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/graph.py\", line 403, in _load_component 
 self._component: GraphComponent = constructor(  # type: ignore[no-redef] 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/graph.py\", line 221, in load 
 return cls.create(config, model_storage, resource, execution_context) 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/nlu/featurizers/dense_featurizer/lm_featurizer.py\", line 100, in create 
 return cls(config, execution_context) 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/nlu/featurizers/dense_featurizer/lm_featurizer.py\", line 67, in __init__ 
 self._load_model_instance() 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/nlu/featurizers/dense_featurizer/lm_featurizer.py\", line 152, in _load_model_instance 
 self.tokenizer = model_tokenizer_dict[self.model_name].from_pretrained( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/transformers/tokenization_utils_base.py\", line 1788, in from_pretrained 
 raise EnvironmentError( 
 OSError: Can't load tokenizer for 'dbmdz/bert-base-german-uncased'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'dbmdz/bert-base-german-uncased' is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer. 
  
 The above exception was the direct cause of the following exception: 
  
 Traceback (most recent call last): 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/core/agent.py\", line 254, in load_agent 
 agent.load_model(model_path) 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/core/agent.py\", line 352, in load_model 
 self.processor = MessageProcessor( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/core/processor.py\", line 105, in __init__ 
 self.model_filename, self.model_metadata, self.graph_runner = self._load_model( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/core/processor.py\", line 142, in _load_model 
 metadata, runner = loader.load_predict_graph_runner( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/loader.py\", line 29, in load_predict_graph_runner 
 runner = graph_runner_class.create( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/runner/dask.py\", line 51, in create 
 return cls(graph_schema, model_storage, execution_context, hooks) 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/runner/dask.py\", line 37, in __init__ 
 self._instantiated_nodes: Dict[Text, GraphNode] = self._instantiate_nodes( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/runner/dask.py\", line 60, in _instantiate_nodes 
 return { 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/runner/dask.py\", line 61, in <dictcomp> 
 node_name: GraphNode.from_schema_node( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/graph.py\", line 566, in from_schema_node 
 return cls( 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/graph.py\", line 392, in __init__ 
 self._load_component() 
 File \"/ python-3.10.14/lib/python3.10/site-packages/rasa/engine/graph.py\", line 416, in _load_component 
 raise GraphComponentException( 
 rasa.engine.exceptions.GraphComponentException: Error initializing graph component for node run_LanguageModelFeaturizer1. 
  Rasa server is up and running. 
  No agent loaded. To continue processing, a model of a trained agent needs to be loaded. 
  No agent loaded. To continue processing, a model of a trained agent needs to be loaded. 
  No agent loaded. To continue processing, a model of a trained agent needs to be loaded.

Set Up

rasa==3.6.20
rasa[transformers]
spacy==3.4.0
tensorflow==2.12.0
websockets==10.0

Config

recipe: default.v1
language: de
assistant_id: id-of-bot
pipeline:
  - name: "WhitespaceTokenizer"
      # Flag to check whether to split intents
    intent_tokenization_flag: false
      # Symbol on which intent should be split
    intent_split_symbol: "_"
  - name: LanguageModelFeaturizer
    alias: "pretrained"
      # Name of the language model to use
    model_name: "bert"
      # Pre-Trained weights to be loaded
    model_weights: "dbmdz/bert-base-german-uncased"
    cache_dir: lfs
  - name: "RegexFeaturizer"
    alias: "regex-featurizer"
    case_sensitive: false
  - name: LexicalSyntacticFeaturizer
    alias: "lexical-syntactic"
    # CountVectorsFeaturizer on word level
  - name: CountVectorsFeaturizer
    alias: "cvf-word"
    analyzer: word
    # CountVectorsFeaturizer within word boundaries looking at n-grams
  - name: CountVectorsFeaturizer
    alias: "cvf-char"
    analyzer: char_wb
    strip_accents: 'ascii'
    min_ngram: 2
    max_ngram: 5
  - name: RegexEntityExtractor
      # text will be processed with case insensitive as default
    case_sensitive: false
      # use lookup tables to extract entities
    use_lookup_tables: true
      # use regexes to extract entities
    use_regexes: true
      # use match word boundaries for lookup table
    "use_word_boundaries": true
  - name: "DIETClassifier"
    epochs: 100
    featurizers: ["cvf-char", "cvf-word", "pretrained", regex-featurizer]
    number_of_transformer_layers: 4
    transformer_size: 256
    embedding_dimension: 30
    constrain_similarities: true
    random_seed: 14
    learning_rate: 0.002
  - name: "EntitySynonymMapper"
    # NLU Fallback
  - name: "FallbackClassifier"
    threshold: 0.7
    ambiguity_threshold: 0.1
    # Response Selector for Chitchat
  - name: ResponseSelector
    epochs: 100
    featurizers: ["cvf-char", "cvf-word", "pretrained"]
    retrieval_intent: chitchat

  # Configuration for Rasa Core.
  # https://rasa.com/docs/rasa/core/policies/

policies:
  - name: "RulePolicy"
    core_fallback_threshold: 0.3
    core_fallback_action_name: "action_custom_fallback"
    enable_fallback_prediction: true