Background: After distributing the Dask graph to a Ray cluster with only a single node (discussed in this post) , I have tried the same code with multi-node cluster but have been met with this error:
File "/home/azureuser/bot/Raysa-Rasa/rasa/core/policies/memoization.py", line 184, in train
self.persist()
File "/home/azureuser/bot/Raysa-Rasa/rasa/core/policies/memoization.py", line 269, in persist
with self._model_storage.write_to(self._resource) as path:
File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/home/azureuser/bot/Raysa-Rasa/rasa/engine/storage/local_model_storage.py", line 121, in write_to
directory.mkdir()
File "/home/azureuser/miniconda3/envs/raysa_env/lib/python3.7/pathlib.py", line 1273, in mkdir
self._accessor.mkdir(self, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp6i3dehxe/train_MemoizationPolicy0'
Pathlib’s mkdir fails to make a directory for Memoization policy. What’s strange here is that it succeeds up to this policy but not this one, even though their persistence to storage is handled by the same LocalModelStorage
’s write_to
method. I have tried setting the parameter parents=True
to prevent FileNotFoundError
and that did not work. I get the same exact error as if nothing changed.
Here is the complete error output error_output.txt (22.5 KB). Any help or pointing out where should I look would be greatly appreciated!
Edit 1: should I rebuild the bot from source?