Unable to train rasa using docker with DIET classifier and BERT in NLU config.yml

I am trying to using following configuration in config.yml-

Configuration for Rasa NLU.

language: en

pipeline:

  • name: HFTransformersNLP model_weights: “bert-base-uncased” model_name: “bert”
  • name: LanguageModelTokenizer
  • name: LanguageModelFeaturizer
  • name: CountVectorsFeaturizer lowercase: True
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: RegexFeaturizer
  • name: DIETClassifier epochs: 30 num_transformer_layers: 4 transformer_size: 256 use_masked_language_model: True drop_rate: 0.25 weight_sparsity: 0.7 batch_size: [64,256] embedding_dimension: 30 hidden_layer_sizes: text: [512,128]
  • name: EntitySynonymMapper

Configuration for Rasa Core.

Ted policy

policies:

  • name: TEDPolicy max_history: 5 epochs: 60 batch_size: 50
  • name: MemoizationPolicy
  • name: MappingPolicy
  • name: FormPolicy
  • name: FallbackPolicy nlu_threshold: 0.7 core_threshold: 0.7 fallback_action_name: “action_hr_fallback”

I am able to build locally without docker. But with docker, it is giving the following error:-

2020-05-01 07:56:10 INFO filelock - Lock 139650696593192 acquired on /root/.cache/torch/transformers/d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.lock Downloading: 100%|██████████| 536M/536M [01:03<00:00, 8.48MB/s] 2020-05-01 07:57:13 INFO filelock - Lock 139650696593192 released on /root/.cache/torch/transformers/d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5.lock 2020-05-01 07:57:13 INFO transformers.modeling_tf_utils - loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-tf_model.h5 from cache at /root/.cache/torch/transformers/d667df51ec24c20190f01fb4c20a21debc4c4fc12f7e2f5441ac0a99690e3ee9.4733ec82e81d40e9cf5fd04556267d8958fb150e9339390fc64206b7e5a79c83.h5 Training NLU model… Traceback (most recent call last): File “/opt/venv/bin/rasa”, line 8, in sys.exit(main()) File “/opt/venv/lib/python3.6/site-packages/rasa/main.py”, line 91, in main cmdline_arguments.func(cmdline_arguments) File “/opt/venv/lib/python3.6/site-packages/rasa/cli/train.py”, line 140, in train_nlu persist_nlu_training_data=args.persist_nlu_data, File “/opt/venv/lib/python3.6/site-packages/rasa/train.py”, line 414, in train_nlu persist_nlu_training_data, File “uvloop/loop.pyx”, line 1456, in uvloop.loop.Loop.run_until_complete File “/opt/venv/lib/python3.6/site-packages/rasa/train.py”, line 445, in _train_nlu_async persist_nlu_training_data=persist_nlu_training_data, File “/opt/venv/lib/python3.6/site-packages/rasa/train.py”, line 474, in _train_nlu_with_validated_data persist_nlu_training_data=persist_nlu_training_data, File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/train.py”, line 74, in train trainer = Trainer(nlu_config, component_builder) File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/model.py”, line 145, in init self.pipeline = self._build_pipeline(cfg, component_builder) File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/model.py”, line 157, in _build_pipeline component = component_builder.create_component(component_cfg, cfg) File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/components.py”, line 755, in create_component component = registry.create_component_by_config(component_config, cfg) File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/registry.py”, line 246, in create_component_by_config return component_class.create(component_config, config) File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/components.py”, line 469, in create return cls(component_config) File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py”, line 47, in init self._load_model() File “/opt/venv/lib/python3.6/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py”, line 84, in _load_model self.model_weights, cache_dir=self.cache_dir File “/opt/venv/lib/python3.6/site-packages/transformers/modeling_tf_utils.py”, line 355, in from_pretrained model.load_weights(resolved_archive_file, by_name=True) File “/opt/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py”, line 234, in load_weights return super(Model, self).load_weights(filepath, by_name, skip_mismatch) File “/opt/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py”, line 1220, in load_weights f, self.layers, skip_mismatch=skip_mismatch) File “/opt/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py”, line 745, in load_weights_from_hdf5_group_by_name weight_values = [np.asarray(g[weight_name]) for weight_name in weight_names] File “/opt/venv/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py”, line 745, in weight_values = [np.asarray(g[weight_name]) for weight_name in weight_names] File “/opt/venv/lib/python3.6/site-packages/numpy/core/_asarray.py”, line 85, in asarray return array(a, dtype, copy=False, order=order) File “h5py/_objects.pyx”, line 54, in h5py._objects.with_phil.wrapper File “h5py/_objects.pyx”, line 55, in h5py._objects.with_phil.wrapper File “/opt/venv/lib/python3.6/site-packages/h5py/_hl/dataset.py”, line 766, in array arr = numpy.empty(self.shape, dtype=self.dtype if dtype is None else dtype) MemoryError: Unable to allocate 89.4 MiB for an array with shape (30522, 768) and data type float32

My docker file is this:-

FROM rasa/rasa:1.9.5 user root

copy ./ ./main RUN chown 777 -R ./main

WORKDIR ./main

RUN chmod 777 train_rva_bot.sh RUN pip install --no-cache-dir rasa[transformers] RUN ./train_rva_bot.sh CMD [“/bin/bash”]

Seems like you are running out of memory inside the docker image. To save some memory you could use the cache_dir option for HFTransformersNLP. Point the directory to a path outside of the docker image and mount that via the -v option from docker. The BERT model will then be downloaded to your local file system and is not stored inside the docker container. Does that help?

3 Likes

Thanks @Tanja. It did work for me

@Tanja can you please share an example Dockerfile to do what you are suggesting here?