$ rasa train -vv Training WITH Ray ... 2022-02-03 04:28:20 DEBUG ray.worker - Automatically increasing RLIMIT_NOFILE to max value of 1048576 2022-02-03 04:28:20,784 INFO worker.py:843 -- Connecting to existing Ray cluster at address: 10.2.0.4:6379 [{'Alive': True, 'MetricsExportPort': 51433, 'NodeID': '73bfc9378f3cb010341e376d1d442c104f72bbd58b68b789d2b4dad5', 'NodeManagerAddress': '10.2.0.5', 'NodeManagerHostname': 'worker1VM', 'NodeManagerPort': 38727, 'ObjectManagerPort': 37479, 'ObjectStoreSocketName': '/tmp/ray/session_2022-02-03_04-27-23_532819_6239/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2022-02-03_04-27-23_532819_6239/sockets/raylet', 'Resources': {'CPU': 2.0, 'memory': 2509863732.0, 'node:10.2.0.5': 1.0, 'object_store_memory': 1075655884.0}, 'alive': True}, {'Alive': True, 'MetricsExportPort': 59432, 'NodeID': '31db27d5923eb0f42913e8846ca429a26c4e7ffdde4bd319374c9ff6', 'NodeManagerAddress': '10.2.0.6', 'NodeManagerHostname': 'worker2VM', 'NodeManagerPort': 36057, 'ObjectManagerPort': 43831, 'ObjectStoreSocketName': '/tmp/ray/session_2022-02-03_04-27-23_532819_6239/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2022-02-03_04-27-23_532819_6239/sockets/raylet', 'Resources': {'CPU': 2.0, 'memory': 5624313856.0, 'node:10.2.0.6': 1.0, 'object_store_memory': 2410420224.0}, 'alive': True}, {'Alive': True, 'MetricsExportPort': 54370, 'NodeID': '2c10e28c2c97ce974f09cf531896893b0a6ac0c8a7102d51ab8bfe65', 'NodeManagerAddress': '10.2.0.4', 'NodeManagerHostname': 'masterVM', 'NodeManagerPort': 34801, 'ObjectManagerPort': 34297, 'ObjectStoreSocketName': '/tmp/ray/session_2022-02-03_04-27-23_532819_6239/sockets/plasma_store', 'RayletSocketName': '/tmp/ray/session_2022-02-03_04-27-23_532819_6239/sockets/raylet', 'Resources': {'CPU': 2.0, 'memory': 2335420416.0, 'node:10.2.0.4': 1.0, 'object_store_memory': 1167710208.0}, 'alive': True}] This cluster consists of 3 nodes in total 6.0 CPU resources in total Timer started ... 2022-02-03 04:28:21 DEBUG h5py._conv - Creating converter from 7 to 5 2022-02-03 04:28:21 DEBUG h5py._conv - Creating converter from 5 to 7 2022-02-03 04:28:21 DEBUG h5py._conv - Creating converter from 7 to 5 2022-02-03 04:28:21 DEBUG h5py._conv - Creating converter from 5 to 7 2022-02-03 04:28:23 DEBUG rasa.shared.nlu.training_data.loading - Training data format of 'data/nlu.yml' is 'rasa_yml'. 2022-02-03 04:28:23 DEBUG rasa.shared.nlu.training_data.loading - Training data format of 'data/rules.yml'is 'unk'. 2022-02-03 04:28:23 DEBUG rasa.shared.nlu.training_data.loading - Training data format of 'data/stories.yml' is 'unk'. /home/azureuser/bot/Raysa-Rasa/rasa/shared/core/slot_mappings.py:217: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more. UserWarning, 2022-02-03 04:28:23 DEBUG rasa.shared.nlu.training_data.loading - Training data format of 'data/nlu.yml' is 'rasa_yml'. 2022-02-03 04:28:23 DEBUG rasa.shared.importers.importer - Added 14 training data examples from the story training data. 2022-02-03 04:28:23 DEBUG rasa.shared.nlu.training_data.loading - Training data format of 'data/nlu.yml' is 'rasa_yml'. 2022-02-03 04:28:23 DEBUG rasa.telemetry - Skipping request to external service: telemetry key not set. 2022-02-03 04:28:23 DEBUG rasa.engine.caching - Deleted 0 from disk as their version is older than the minimum compatible version ('3.0.0'). 2022-02-03 04:28:23 DEBUG rasa.telemetry - Skipping request to external service: telemetry key not set. 2022-02-03 04:28:23 DEBUG rasa.engine.training.graph_trainer - Starting training. /home/azureuser/bot/Raysa-Rasa/rasa/shared/core/slot_mappings.py:217: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more. UserWarning, 2022-02-03 04:28:23 DEBUG rasa.engine.training.graph_trainer - Skip fingerprint run as a full training of the model was enforced. 2022-02-03 04:28:23 DEBUG rasa.engine.training.graph_trainer - Running the pruned train graph with real node execution. --------- start graph runner 2022-02-03 04:28:23 DEBUG rasa.engine.runner.dask_on_ray - Running graph with inputs: {'__importer__': E2EImporter}, targets: None and ExecutionContext(model_id=None, should_add_diagnostic_data=False, is_finetuning=False, node_name=None). (dask:schema_validator pid=6781) 2022-02-03 04:28:24.770587: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory (dask:schema_validator pid=6781) 2022-02-03 04:28:24.770631: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. (dask:finetuning_validator pid=6781) R_PATH: /tmp/tmpca7xnueg/finetuning_validator (dask:finetuning_validator pid=6781) R_DIR NOT EXISTING: /tmp/tmpca7xnueg/finetuning_validator (dask:domain_provider pid=6781) R_PATH: /tmp/tmpca7xnueg/domain_provider (dask:domain_provider pid=6781) R_DIR NOT EXISTING: /tmp/tmpca7xnueg/domain_provider (dask:schema_validator pid=6781) /home/azureuser/bot/Raysa-Rasa/rasa/shared/core/slot_mappings.py:217: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more. (dask:schema_validator pid=6781) UserWarning, (dask:schema_validator pid=6781) :task_name:dask:finetuning_validator Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 2264.74it/s, # trackers=1] Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 1173.23it/s, # trackers=3] Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 249.65it/s, # trackers=12] Processed story blocks: 100%|██████████| 3/3 [00:00<00:00, 57.97it/s, # trackers=39] Processed rules: 100%|██████████| 2/2 [00:00<00:00, 2717.40it/s, # trackers=1] (dask:train_RegexFeaturizer1 pid=6780) 2022-02-03 04:28:27.692411: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory (dask:train_RegexFeaturizer1 pid=6780) 2022-02-03 04:28:27.692455: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. (dask:train_MemoizationPolicy0 pid=4144, ip=10.2.0.6) 2022-02-03 04:28:28.618443: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory (dask:train_MemoizationPolicy0 pid=4144, ip=10.2.0.6) 2022-02-03 04:28:28.618477: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine. (dask:train_UnexpecTEDIntentPolicy2 pid=6781) P_RES: Resource(name='train_UnexpecTEDIntentPolicy2', output_fingerprint='c1f6f0269f0240b89b4bf8fce5707585') Processed trackers: 100%|██████████| 120/120 [00:00<00:00, 3213.88it/s, # intent=12] (dask:train_UnexpecTEDIntentPolicy2 pid=6781) /home/azureuser/bot/Raysa-Rasa/rasa/utils/tensorflow/model_data_utils.py:384: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this,you must specify 'dtype=object' when creating the ndarray (dask:train_UnexpecTEDIntentPolicy2 pid=6781) np.array(values), number_of_dimensions=4 (dask:train_UnexpecTEDIntentPolicy2 pid=6781) /home/azureuser/bot/Raysa-Rasa/rasa/utils/tensorflow/model_data_utils.py:400: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this,you must specify 'dtype=object' when creating the ndarray (dask:train_UnexpecTEDIntentPolicy2 pid=6781) MASK: [FeatureArray(np.array(attribute_masks), number_of_dimensions=3)] (dask:train_UnexpecTEDIntentPolicy2 pid=6781) 2022-02-03 04:28:28.799076: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory (dask:train_UnexpecTEDIntentPolicy2 pid=6781) 2022-02-03 04:28:28.799110: W tensorflow/stream_executor/cuda/cuda_driver.cc:269] failed call to cuInit: UNKNOWN ERROR (303) (dask:train_UnexpecTEDIntentPolicy2 pid=6781) 2022-02-03 04:28:28.799131: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (masterVM): /proc/driver/nvidia/version does not exist (dask:train_UnexpecTEDIntentPolicy2 pid=6781) 2022-02-03 04:28:28.799360: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA (dask:train_UnexpecTEDIntentPolicy2 pid=6781) To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. Epochs: 0%| | 0/100 [00:00 main() File "/home/azureuser/bot/Raysa-Rasa/rasa/__main__.py", line 119, in main cmdline_arguments.func(cmdline_arguments) File "/home/azureuser/bot/Raysa-Rasa/rasa/cli/train.py", line 60, in train_parser.set_defaults(func=lambda args: run_training(args, can_exit=True)) File "/home/azureuser/bot/Raysa-Rasa/rasa/cli/train.py", line 129, in run_training finetuning_epoch_fraction=args.epoch_fraction, File "/home/azureuser/bot/Raysa-Rasa/rasa/api.py", line 118, in train_dist finetuning_epoch_fraction=finetuning_epoch_fraction, File "/home/azureuser/bot/Raysa-Rasa/rasa/model_training.py", line 175, in train_dist **(nlu_additional_arguments or {}), File "/home/azureuser/bot/Raysa-Rasa/rasa/model_training.py", line 241, in _train_graph_dist is_finetuning=is_finetuning, File "/home/azureuser/bot/Raysa-Rasa/rasa/engine/training/graph_trainer.py", line 174, in train_dist graph_runner.run(inputs={PLACEHOLDER_IMPORTER: importer}) File "/home/azureuser/bot/Raysa-Rasa/rasa/engine/runner/dask_on_ray.py", line 132, in run dask_result = ray_dask_get(run_graph, run_targets) File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 127, in ray_dask_get result = ray_get_unpack(object_refs, progress_bar_actor=pb_actor) File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 423, in ray_get_unpack return get_result(object_refs) File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 408, in get_result return ray.get(object_refs) File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/ray/_private/client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/ray/worker.py", line 1713, in get raise value.as_instanceof_cause() ray.exceptions.RayTaskError(GraphComponentException): ray::dask:train_MemoizationPolicy0 (pid=4144, ip=10.2.0.6) File "/home/azureuser/bot/Raysa-Rasa/rasa/core/policies/memoization.py", line 184, in train self.persist() File "/home/azureuser/bot/Raysa-Rasa/rasa/core/policies/memoization.py", line 269, in persist with self._model_storage.write_to(self._resource) as path: File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/contextlib.py", line 112, in __enter__ return next(self.gen) File "/home/azureuser/bot/Raysa-Rasa/rasa/engine/storage/local_model_storage.py", line 121, in write_to directory.mkdir() File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/pathlib.py", line 1273, in mkdir self._accessor.mkdir(self, mode) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpca7xnueg/train_MemoizationPolicy0' The above exception was the direct cause of the following exception: ray::dask:train_MemoizationPolicy0 (pid=4144, ip=10.2.0.6) File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/ray/util/dask/scheduler.py", line 350, in dask_task_wrapper result = func(*actual_args) File "/home/azureuser/bot/Raysa-Rasa/rasa/engine/graph.py", line 516, in __call__ ) from e rasa.engine.exceptions.GraphComponentException: Error running graph component for node train_MemoizationPolicy0. Processed actions: 12it [00:00, 2187.29it/s, # examples=12] Processed trackers: 100%|██████████| 2/2 [00:00<00:00, 3552.99it/s, # action=5] Processed actions: 5it [00:00, 16657.28it/s, # examples=4] Processed trackers: 100%|██████████| 3/3 [00:00<00:00, 2804.30it/s, # action=12] Processed trackers: 100%|██████████| 2/2 [00:00<00:00, 1914.33it/s] Processed trackers: 100%|██████████| 5/5 [00:00<00:00, 1467.05it/s] (dask:train_TEDPolicy3 pid=4031, ip=10.2.0.5) /home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/site-packages/tensorflow/python/framework/indexed_slices.py:449: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("gradients/cond_grad/Identity_1:0", shape=(None,), dtype=int64), values=Tensor("gradients/cond_grad/Identity:0", shape=(None,), dtype=float32), dense_shape=Tensor("gradients/cond_grad/Identity_2:0", shape=(1,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. (dask:train_TEDPolicy3 pid=4031, ip=10.2.0.5) "shape. This may consume a large amount of memory." % value)