Dask and Rasa

bparbhu · November 26, 2021, 3:54pm

Hi All,

First time posting on here, I’m a big fan of Rasa and the options it provides us looking to create meaningful and powerful chatbots. Though, something I’ve been curious about is whether or not Rasa has the ability to use dask to help train Rasa models.

I came accross the rasa.engine.runner.dask documentation located here:

I just would like some more elaboration as to how you can use Rasa with dask and also what parts of dask is Rasa using. For example is Rasa using the dask graph and executor to be able to train Rasa models on a dask cluster or your local machine?

Also, what examples currently exist for using Rasa with Dask?

Thanks again and much appreciated!

-Brian

toza-mimoza · January 26, 2022, 5:01pm

Hi Brian, I am interested in this subject and want to use Dask on Ray. Observing the code, I noticed that Dask is used but I’m stuck with Ray for bachelor thesis.

Anyway, Rasa already uses Dask, at least in versions 3+ for training graph but they use dask.get() on line 101 of dask.py, which is “single-threaded” and certainly synchronous scheduler. Apparently, tensorflow already uses parallel tasks locally but does not scale well over one machine.

bparbhu · January 26, 2022, 5:59pm

Hi @toza-mimoza ,

I’m very interested in this as well. Mainly my use case is just making sure that if we use what’s available here, would it be able to be passed to a dask cluster for use in a local or remote setting? I’m also someone interested in rapids for nlp use on a gpu. But if any form of this works with the dask executor then this will be in the right direction. It would be weird to do dask integration for Rasa and just leave that ability out.

-Brian

toza-mimoza · January 26, 2022, 6:18pm

Hi @bparbhu ,

I suppose it would be possible to use Dask locally and remotely although I am not familiar with Dask, but let’s stir up the community and developers to help us.

Here I had this suggestion/question for DaskGraphRunner regarding the synchronous/threaded scheduler [DaskGraphRunner] dask.threaded.get instead of dask.get · Issue #10754 · RasaHQ/rasa · GitHub.

Here is my post regarding Dask on Ray integration: Dask on Ray for DaskGraphRunner: Serialization of GraphNode class.

Hope our questions get answered.

Best,

Svetozar

toza-mimoza · February 3, 2022, 7:33pm

@bparbhu It’s been days but I have managed to run Rasa’s Dask graph on Ray cluster.

Here are my observations:

To be clear, I had to disable cache since it cannot be serialized (there is an SQLAlchemy object in a TrainingHook somewhere in the code) and it’s not important for my use case (Bachelor thesis/research) but that’s already already a big disadvantage for Rasa 3.x+. I did not change the configuration nor added any stories or intents, it was a pure rasa init chatbot.

On Azure I have 3 VMs with 2 vCPU resources each, which makes 6 in total for the whole cluster. At any time Dask graph uses up to 5 vCPUs, with 3-4 being most common, because I suppose max 4 graph nodes can be ran in parallel. That corresponds with local testing on a single-node cluster with my computer, where I do not see the benefit of parallelizing beyond 4 threads.

So the best bet is using threaded dask and not the cluster.

Topic		Replies	Views
Dask on Ray for DaskGraphRunner: Serialization of GraphNode class Rasa Open Source	12	2029	January 29, 2022
Parallelizing Training Mechanism in Rasa 3.0.x Rasa Open Source	6	1016	February 16, 2022
Issue while training the model using docker Rasa Open Source testing	1	150	April 24, 2024
Best Practices for using rasa python library Rasa Open Source	1	337	February 4, 2021
Error initializing graph component for node run_LanguageModelFeaturizer2 Rasa Open Source	0	284	January 24, 2024

Dask and Rasa

Related topics