How to use GPU on colab, or anywhere?

I was testing the helpdesk-assistant from https://github.com/RasaHQ/helpdesk-assistant.

  1. Set up colab in GPU runtime
  2. Run following in cells
!git clone https://github.com/RasaHQ/helpdesk-assistant.git
!git checkout 4c6d31c # move down to rasa 2 since forum post says rasa 3 and gpu don't work
!pip install -U pip==20.2
!pip install -r requirements.txt
!pip install -U ipython

#manual restart runtime
!pip install colab-xterm
%load_ext colabxterm
%xterm

Run watch nvidia-smi inside xterm to see usage every 2 seconds. By the way, xterm (non-blocking) seems to be useful for running action server too, compared to the blocking !bash.

%cd helpdesk-assistant/
!rasa train
  • On Mac it took 4mins to train.
  • On sagemaker studio lab CPU it’s 3.5min, GPU it’s 3min
  • On colab CPU runtime 8min.
  • On colab GPU runtime and pip install tensorflow-gpu, its 5min (DIET took 1:16min)
  • GPU-Util peaked at 40% during DietClassifier and 13% during a section about Processed trackers (not sure what this trains, later ends with Core model training completed.)
  1. Why is colab’s GPU training time even longer than CPU on my Mac and sagemaker studio lab?
  2. Is GPU really being used? watch nvidia-smi shows a maximum of about 930MiB / 15109MiB during rasa train (specifically the NLU part on DIET).
  3. Why goes Rasa 3 version of helpdesk-assistant also take 8min on GPU, is it really being used? This post seems to say GPU was used on 2 but not 3 (How to train Rasa3 nlu on gpu), however his 2,3 timing difference is way bigger than my 5min vs 8min, and i doubt the GPU memory allocation or whatever other overhead takes so long?
  4. Any guidelines to workflows when developing in rasa when each tiny edit requires an entire rasa train again before we can verify results? I can’t imagine waiting 10mins of rasa train per change. Even if we break it down to rasa train nlu and rasa train core, it is still unlikely unscalable. Even 1 min training on rasa init feels slow.
  5. Any resources/tutorials on people using GPU? I know there’s an article here but he uses docker, but sagemaker and colab cannot run docker. How do we work with docker on colab, given that helpdesk-assistant needs docker for docker run -p 8000:8000 rasa/duckling. (yes i know we can install duckling to avoid docker, but is there any more direct solution that allows following the readme to use docker?)