Training fails when using HFTransformersNLP Rasa X

k1m · May 24, 2020, 2:44pm

Rasa X 0.28.3
Rasa 1.10.0

Trying to use HFTransformersNLP but it seems to fail, no new models are created. Logs didn’t show anything related to this.

I have set up Automated Testing and GitHub Model CI shows that it loads HFTransformersNLP and the training succeeded.

pip install rasa[transformers] is need when doing this locally for HFTransformersNLP, but for Rasa X running on the server I have only set the version in .env.

Since it’s rasa/rasa:1.10.0-full already on my Rasa X server. Is there anything else that could cause this?

ricwo · May 26, 2020, 9:18am

Hi @k1m, you’re right, the -full image contains all the necessary dependencies. How did you deploy Rasa X? Could you please try the following:

run your rasa service with the --debug option (set debugMode: true in values.yaml if you’re using the helm deployment)
train a model again using transformers and check the container logs

k1m · May 26, 2020, 2:13pm

Thanks for taking a look at this @ricwo . I did the Docker-Compose Quick Install.

Here is what I found:

2020-05-26T11:22:06.659187273Z 2020-05-26 11:22:06 DEBUG    rasa.nlu.utils.hugging_face.hf_transformers  - Loading Tokenizer and Model for bert
2020-05-26T11:22:06.663170053Z 2020-05-26 11:22:06 DEBUG    rasa.server  - Traceback (most recent call last):
2020-05-26T11:22:06.663199295Z   File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 383, in _from_pretrained
2020-05-26T11:22:06.663206018Z     resume_download=resume_download,
2020-05-26T11:22:06.663212461Z   File "/opt/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 238, in cached_path
2020-05-26T11:22:06.663218073Z     user_agent=user_agent,
2020-05-26T11:22:06.663222570Z   File "/opt/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 349, in get_from_cache
2020-05-26T11:22:06.663227335Z     os.makedirs(cache_dir, exist_ok=True)
2020-05-26T11:22:06.663231817Z   File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
2020-05-26T11:22:06.663249950Z     makedirs(head, exist_ok=exist_ok)
2020-05-26T11:22:06.663254036Z   File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
2020-05-26T11:22:06.663258076Z     makedirs(head, exist_ok=exist_ok)
2020-05-26T11:22:06.663261940Z   File "/usr/local/lib/python3.7/os.py", line 223, in makedirs
2020-05-26T11:22:06.663265949Z     mkdir(name, mode)
2020-05-26T11:22:06.663269768Z PermissionError: [Errno 13] Permission denied: '/.cache'
2020-05-26T11:22:06.663275235Z
2020-05-26T11:22:06.663279015Z During handling of the above exception, another exception occurred:
2020-05-26T11:22:06.663282886Z
2020-05-26T11:22:06.663286635Z Traceback (most recent call last):
2020-05-26T11:22:06.663290444Z   File "/opt/venv/lib/python3.7/site-packages/rasa/server.py",line 808, in train
2020-05-26T11:22:06.663294377Z     None, functools.partial(train_model, **info)
2020-05-26T11:22:06.663298092Z   File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
2020-05-26T11:22:06.663302011Z     result = self.fn(*self.args, **self.kwargs)
2020-05-26T11:22:06.663305729Z   File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 50, in train
2020-05-26T11:22:06.663309635Z     additional_arguments=additional_arguments,
2020-05-26T11:22:06.663313608Z   File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
2020-05-26T11:22:06.663317561Z   File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 101, in train_async
2020-05-26T11:22:06.663321531Z     additional_arguments,
2020-05-26T11:22:06.663325233Z   File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 188, in _train_async_internal
2020-05-26T11:22:06.663329270Z     additional_arguments=additional_arguments,
2020-05-26T11:22:06.663333118Z   File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 245, in _do_training
2020-05-26T11:22:06.663337077Z     persist_nlu_training_data=persist_nlu_training_data,
2020-05-26T11:22:06.663340899Z   File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 482, in _train_nlu_with_validated_data
2020-05-26T11:22:06.663344920Z     persist_nlu_training_data=persist_nlu_training_data,
2020-05-26T11:22:06.663349497Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/train.py", line 75, in train
2020-05-26T11:22:06.663353484Z     trainer = Trainer(nlu_config, component_builder)
2020-05-26T11:22:06.663357284Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/model.py", line 145, in __init__
2020-05-26T11:22:06.663361233Z     self.pipeline = self._build_pipeline(cfg, component_builder)
2020-05-26T11:22:06.663365012Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/model.py", line 157, in _build_pipeline
2020-05-26T11:22:06.663369049Z     component = component_builder.create_component(component_cfg, cfg)
2020-05-26T11:22:06.663376801Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/components.py", line 769, in create_component
2020-05-26T11:22:06.663380843Z     component = registry.create_component_by_config(component_config, cfg)
2020-05-26T11:22:06.663384700Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/registry.py", line 246, in create_component_by_config
2020-05-26T11:22:06.663388816Z     return component_class.create(component_config, config)
2020-05-26T11:22:06.663392683Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/components.py", line 483, in create
2020-05-26T11:22:06.663397008Z     return cls(component_config)
2020-05-26T11:22:06.663400808Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 47, in __init__
2020-05-26T11:22:06.663404856Z     self._load_model()
2020-05-26T11:22:06.663408555Z   File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 81, in _load_model
2020-05-26T11:22:06.663412606Z     self.model_weights, cache_dir=self.cache_dir
2020-05-26T11:22:06.663416535Z   File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 309, in from_pretrained
2020-05-26T11:22:06.663420600Z     return cls._from_pretrained(*inputs, **kwargs)
2020-05-26T11:22:06.663424350Z   File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 400, in _from_pretrained
2020-05-26T11:22:06.663428433Z     raise EnvironmentError(msg)
2020-05-26T11:22:06.663432155Z OSError: Couldn't reach server at '{}' to download vocabulary files.

Tobias_Wochinger · May 27, 2020, 8:56am

@kim Could you please share your model configuration?

The issue is that the Rasa X / Rasa Open Source images don’t run as root users. The model which downloaded as part of the Rasa Open Source training is stored within a directory on disk (/.cache) which requires the user to be root. If you set the cache_dir parameter of the pipeline component to something like /tmp it should work. In case you haven’t specified the cache_dir parameter before, please let us know. In that case we have to change the defaults in Rasa Open Source for that parameter.

k1m · May 27, 2020, 11:48am

Thanks. cache_dir: /tmp fixed it.

Tobias_Wochinger · May 29, 2020, 7:27am

Awesome, did you have it changed previously or did you run with default settings before?

kearnsw · August 31, 2020, 10:01pm

I’m still receiving the error:

Downloading:  72%|███████▏  | 388M/536M [02:13<00:16, 8.87MB/s]2020-08-31 21:55:39 ERROR    rasa.core.agent  - Failed to update model. The previous model will stay loaded instead. Error: Couldn't reach server at '{}' to download vocabulary files.
Traceback (most recent call last):
  File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 383, in _from_pretrained
    resume_download=resume_download,
  File "/opt/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 238, in cached_path
    user_agent=user_agent,
  File "/opt/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 349, in get_from_cache
    os.makedirs(cache_dir, exist_ok=True)
  File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
    makedirs(head, exist_ok=exist_ok)
  File "/usr/local/lib/python3.7/os.py", line 223, in makedirs
    mkdir(name, mode)
PermissionError: [Errno 13] Permission denied: '/.cache'

Rasa X on docker-compose install doesn’t seem to be respecting the cache_dir: /tmp in my config.yml file:

pipeline:
  - name: HFTransformersNLP
    model_name: bert
    model_weights: bert-base-uncased
    cache_dir: /tmp
  - name: LanguageModelTokenizer
    intent_tokenization_flag: false
    intent_split_symbol: _
  - name: LanguageModelFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    retrieval_intent: faq_tl1
  - name: ResponseSelector
    retrieval_intent: faq_kl2
  - name: ResponseSelector
    retrieval_intent: faq_redcap
policies:
  - name: AugmentedMemoizationPolicy
  - name: TEDPolicy
  - name: MappingPolicy
  - name: FormPolicy

fkoerner · November 30, 2020, 7:41am

@kearnsw did you ever figure this one out?

kearnsw · November 30, 2020, 2:15pm

@fkoerner, I did not. I switched to the Helm chart installation.

Topic		Replies	Views
Rasa X Training Error [Deprecated] Feedback on Rasa X	3	1688	July 7, 2020
Unable to train an HFT-based NLU model Rasa Open Source	2	1017	March 11, 2021
Failed to find component class for 'HFTransformersNLP' Rasa Open Source	2	556	August 26, 2020
Training model failed? [Deprecated] Rasa X Community Edition	17	1912	July 27, 2021
Using pretrain model in NLU HFTransformersNLP "vinai/phobert-base" Error Tokenizer Rasa Open Source	19	2416	November 12, 2021

Training fails when using HFTransformersNLP Rasa X

Related topics