Can ted policy use distributed training?

I’m trying to use multiple gpus to optimize ted policy during training time. At first, I tried to add mirrored strategy to training function of ted policy. Below is the run_training in ted_policy.py, I added a segment where I use mirrored strategy and the old segment which uses only one gpu.

    def run_training(
        self, model_data: RasaModelData, label_ids: Optional[np.ndarray] = None
    ) -> None:
        """Feeds the featurized training data to the model.

        Args:
            model_data: Featurized training data.
            label_ids: Label ids corresponding to the data points in `model_data`.
                These may or may not be used by the function depending
                on how the policy is trained.
        """
        # os.environ.pop('TF_CONFIG', None)
        # tf_config = {
        #     'cluster': {
        #         'worker': ['localhost:12345', 'localhost:23456']
        #     },
        #     'task': {'type': 'worker', 'index': 0}
        # }
        # os.environ['TF_CONFIG'] = json.dumps(tf_config)
        # tf_config = json.loads(os.environ['TF_CONFIG'])
        # num_workers = len(tf_config['cluster']['worker'])
        if not self.finetune_mode:
            # This means the model wasn't loaded from a
            # previously trained model and hence needs
            # to be instantiated.
            self.model = self.model_class()(
                model_data.get_signature(),
                self.config,
                isinstance(self.featurizer, MaxHistoryTrackerFeaturizer),
                self._label_data,
                self._entity_tag_specs,
            )
            self.model.compile(
                optimizer=tf.keras.optimizers.Adam(self.config[LEARNING_RATE])
            )
            (
                data_generator,
                validation_data_generator,
            ) = rasa.utils.train_utils.create_data_generators(
                model_data,
                self.config[BATCH_SIZES],
                self.config[EPOCHS],
                self.config[BATCH_STRATEGY],
                self.config[EVAL_NUM_EXAMPLES],
                self.config[RANDOM_SEED],
            )
        callbacks = rasa.utils.train_utils.create_common_callbacks(
            self.config[EPOCHS],
            self.config[TENSORBOARD_LOG_DIR],
            self.config[TENSORBOARD_LOG_LEVEL],
            self.tmp_checkpoint_dir,
        )
        self.model.fit(
            data_generator,
            epochs=self.config[EPOCHS],
            validation_data=validation_data_generator,
            validation_freq=self.config[EVAL_NUM_EPOCHS],
            callbacks=callbacks,
            verbose=False,
            shuffle=False,  # we use custom shuffle inside data generator
        )


        global_batch_size = self.config[BATCH_SIZES]*2
        # tf.debugging.set_log_device_placement(True)
        gpus = tf.config.list_logical_devices('GPU')
        strategy = tf.distribute.MirroredStrategy(gpus)

        if not self.finetune_mode:
            # This means the model wasn't loaded from a
            # previously trained model and hence needs
            # to be instantiated.
            with strategy.scope():
                self.model = self.model_class()(
                    model_data.get_signature(),
                    self.config,
                    isinstance(self.featurizer, MaxHistoryTrackerFeaturizer),
                    self._label_data,
                    self._entity_tag_specs,
                )
                self.model.compile(
                    optimizer=tf.keras.optimizers.Adam(self.config[LEARNING_RATE])
                )

                (
                    data_generator,
                    validation_data_generator,
                ) = rasa.utils.train_utils.create_data_generators(
                    model_data,
                    global_batch_size,
                    self.config[EPOCHS],
                    self.config[BATCH_STRATEGY],
                    self.config[EVAL_NUM_EXAMPLES],
                    self.config[RANDOM_SEED],
                )
        callbacks = rasa.utils.train_utils.create_common_callbacks(
            self.config[EPOCHS],
            self.config[TENSORBOARD_LOG_DIR],
            self.config[TENSORBOARD_LOG_LEVEL],
            self.tmp_checkpoint_dir,
        )
        self.model.fit(
            data_generator,
            epochs=self.config[EPOCHS],
            validation_data=validation_data_generator,
            validation_freq=self.config[EVAL_NUM_EPOCHS],
            callbacks=callbacks,
            verbose=False,
            shuffle=False,  # we use custom shuffle inside data generator

The first run without mirrored strategy is okay but when running with mirrored strategy, there is a conflict in ted model. I can’t figure out where is the cause of it (the distributed training makes it very hard to debug). and this is the log after running:

/root/rasa/rasa/shared/core/slot_mappings.py:216: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more.
  UserWarning,
/root/rasa/rasa/shared/core/slot_mappings.py:216: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more.
  UserWarning,
Processed story blocks: 100%|β–ˆβ–ˆβ–ˆ| 13/13 [00:00<00:00, 1271.09it/s, # trackers=1]
Processed story blocks: 100%|β–ˆβ–ˆβ–ˆ| 13/13 [00:00<00:00, 148.73it/s, # trackers=12]
Processed story blocks: 100%|β–ˆβ–ˆβ–ˆβ–ˆ| 13/13 [00:00<00:00, 21.72it/s, # trackers=50]
Processed story blocks: 100%|β–ˆβ–ˆβ–ˆβ–ˆ| 13/13 [00:00<00:00, 26.74it/s, # trackers=50]
Processed rules: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 48/48 [00:00<00:00, 252.89it/s, # trackers=1]
/root/rasa/rasa/utils/train_utils.py:530: UserWarning: constrain_similarities is set to `False`. It is recommended to set it to `True` when using cross-entropy loss.
  category=UserWarning,
/root/rasa/rasa/shared/utils/io.py:99: UserWarning: 'evaluate_every_number_of_epochs=20' is greater than 'epochs=2'. No evaluation will occur.
Processed trackers: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 512/512 [00:00<00:00, 965.76it/s, # action=1635]
/root/rasa/rasa/utils/tensorflow/model_data_utils.py:384: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.array(values), number_of_dimensions=4
/root/rasa/rasa/utils/tensorflow/model_data_utils.py:400: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  MASK: [FeatureArray(np.array(attribute_masks), number_of_dimensions=3)]
2022-02-17 08:55:08.464804: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-02-17 08:55:09.599068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 30652 MB memory:  -> device: 0, name: Tesla V100-SXM2-32GB, pci bus id: 0000:0a:00.0, compute capability: 7.0
2022-02-17 08:55:09.601503: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 30652 MB memory:  -> device: 1, name: Tesla V100-SXM2-32GB, pci bus id: 0000:85:00.0, compute capability: 7.0
/root/rasa/rasa/utils/tensorflow/model_data.py:750: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.concatenate(np.array(f)),
Epochs:   0%|                                             | 0/2 [00:00<?, ?it/s]2022-02-17 08:55:10.764997: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_1_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_1_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_1_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_2_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_2_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_2_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
Epochs:  50%|β–ˆβ–ˆβ–Œ  | 1/2 [00:33<00:33, 33.25s/it, t_loss=6, loss=5.72, acc=0.518]/root/rasa/rasa/utils/tensorflow/model_data.py:750: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.concatenate(np.array(f)),
Epochs: 100%|β–ˆβ–ˆ| 2/2 [00:53<00:00, 26.99s/it, t_loss=5.44, loss=4.94, acc=0.921]
Epochs:   0%|                                             | 0/2 [00:00<?, ?it/s]WARNING:tensorflow:Using MirroredStrategy eagerly has significant overhead currently. We will be working on improving this in the future, but for now please wrap `call_for_each_replica` or `experimental_run` or `run` inside a tf.function to get the best performance.
2022-02-17 08:56:05.702741: W tensorflow/core/grappler/optimizers/data/auto_shard.cc:695] AUTO sharding policy will apply DATA sharding policy as it failed to apply FILE sharding policy because of the following reason: Did not find a shardable source, walked to a node which is not a dataset: name: "FlatMapDataset/_2"
op: "FlatMapDataset"
input: "TensorDataset/_1"
attr {
  key: "Targuments"
  value {
    list {
    }
  }
}
attr {
  key: "f"
  value {
    func {
      name: "__inference_Dataset_flat_map_flat_map_fn_21967"
    }
  }
}
attr {
  key: "output_shapes"
  value {
    list {
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
      shape {
        dim {
          size: -1
        }
      }
    }
  }
}
attr {
  key: "output_types"
  value {
    list {
      type: DT_FLOAT
      type: DT_INT64
      type: DT_FLOAT
      type: DT_INT64
      type: DT_FLOAT
      type: DT_FLOAT
      type: DT_INT64
      type: DT_FLOAT
      type: DT_INT64
      type: DT_FLOAT
      type: DT_FLOAT
      type: DT_INT64
      type: DT_FLOAT
      type: DT_INT64
    }
  }
}
. Consider either turning off auto-sharding or switching the auto_shard_policy to DATA to shard this dataset. You can do this by creating a new `tf.data.Options()` object then setting `options.experimental_distribute.auto_shard_policy = AutoShardPolicy.DATA` before applying the options object to the dataset via `dataset.with_options(options)`.
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_4_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_4_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_4_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_5_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_5_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_5_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_6_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_6_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_6_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/replica_1/cond_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/replica_1/cond_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/replica_1/cond_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/replica_1/cond_1_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/replica_1/cond_1_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/replica_1/cond_1_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/replica_1/cond_2_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/replica_1/cond_2_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/replica_1/cond_2_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "shape. This may consume a large amount of memory." % value)
Traceback (most recent call last):
  File "/root/rasa/rasa/engine/graph.py", line 467, in __call__
    output = self._fn(self._component, **run_kwargs)
  File "/root/rasa/rasa/core/policies/ted_policy.py", line 777, in train
    self.run_training(model_data, label_ids)
  File "/root/rasa/rasa/core/policies/ted_policy.py", line 740, in run_training
    shuffle=False,  # we use custom shuffle inside data generator
  File "/root/rasa/rasa/utils/tensorflow/temp_keras_modules.py", line 190, in fit
    tmp_logs = train_function(iterator)
  File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3040, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/root/rasa/.venv/lib/python3.7/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.InvalidArgumentError: 3 root error(s) found.
  (0) Invalid argument:  Dimensions [0,1) of indices[shape=[17,2]] must match dimensions [0,1) of updates[shape=[24,50]]
	 [[{{node cond_4/StatefulPartitionedCall/cond_4_20/then/_877/cond_4/ScatterNd}}]]
	 [[div_no_nan_1/ReadVariableOp/_892]]
  (1) Invalid argument:  Dimensions [0,1) of indices[shape=[17,2]] must match dimensions [0,1) of updates[shape=[24,50]]
	 [[{{node cond_4/StatefulPartitionedCall/cond_4_20/then/_877/cond_4/ScatterNd}}]]
  (2) Invalid argument:  Dimensions [0,1) of indices[shape=[17,2]] must match dimensions [0,1) of updates[shape=[24,50]]
	 [[{{node cond_4/StatefulPartitionedCall/cond_4_20/then/_877/cond_4/ScatterNd}}]]
	 [[update_0/AssignAddVariableOp/_845]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_52312]

Function call stack:
train_function -> train_function -> train_function


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "run.py", line 54, in <module>
    run(cmdline_arguments)
  File "/root/rasa/tools/controller.py", line 53, in run
    run_train_core(args)
  File "/root/rasa/tools/training_tools.py", line 90, in run_train_core
    finetuning_epoch_fraction=args.epoch_fraction,
  File "/root/rasa/rasa/model_training.py", line 346, in train_core
    **(additional_arguments or {}),
  File "/root/rasa/rasa/model_training.py", line 242, in _train_graph
    is_finetuning=is_finetuning,
  File "/root/rasa/rasa/engine/training/graph_trainer.py", line 108, in train
    graph_runner.run(inputs={PLACEHOLDER_IMPORTER: importer})
  File "/root/rasa/rasa/engine/runner/dask.py", line 106, in run
    dask_result = dask.get(run_graph, run_targets)
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 558, in get_sync
    **kwargs,
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 496, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "/root/miniconda3/lib/python3.7/concurrent/futures/_base.py", line 425, in result
    return self.__get_result()
  File "/root/miniconda3/lib/python3.7/concurrent/futures/_base.py", line 384, in __get_result
    raise self._exception
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 538, in submit
    fut.set_result(fn(*args, **kwargs))
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 234, in batch_execute_tasks
    return [execute_task(*a) for a in it]
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 234, in <listcomp>
    return [execute_task(*a) for a in it]
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 225, in execute_task
    result = pack_exception(e, dumps)
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/local.py", line 220, in execute_task
    result = _execute_task(task, data)
  File "/root/rasa/.venv/lib/python3.7/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/root/rasa/rasa/engine/graph.py", line 476, in __call__
    ) from e
rasa.engine.exceptions.GraphComponentException: Error running graph component for node train_TEDPolicy2.
Epochs:   0%|                                             | 0/2 [00:23<?, ?it/s]
2022-02-17 08:56:29.372037: W tensorflow/core/kernels/data/generator_dataset_op.cc:107] Error occurred when finalizing GeneratorDataset iterator: Failed precondition: Python interpreter state is not initialized. The process may be terminated.
	 [[{{node PyFunc}}]]

I have opened a similar issue on rasa github but haven’t seen reply so far. I really need help from anybody who is familiar with Ted model and distributed training to fix this bug. much appreciated!

2 Likes

Managed to achieve something? Having very similar issues when trying to train DIET on multiple-gpus.

1 Like