Training on GPU-enabled VMs

Hi, the training data for our bots is growing at a rapid pace, and training on developer’s CPU units or general available VM for CI is not sustainable. For some bots, our devs have to wait 1.5 hrs after just adding a few NLU examples.

We’re looking to establish a workflow to utilize GPU-enabled clusters. Either databricks, Azure ML or Azure-provided GPU instances. Trying to work out the best way to package code, submit job to these clusters/VM remotely. Also considering using the sdk directly for the training.

Any insights, tips, tricks are highly appreciated.