Difficulties using the new recommended pipeline

Using rasa and rasa-sdk version 1.8.0. I tried using the new pipeline (see my example below) and even set the number of epochs for the DIETClassifier to 20 (from the default of 300).

In both cases, while running the rasa interactive command, I get a bunch of Tensorflow errors (or warnings?) and also while running the training, my training gets aborted and goes back to the command line with no errors.

Should I limit it to 1 epoch for the classifier? I got a pretty high value for the losses with just one epoch and the bot’s intent classification and entity recognition was pretty poor.

Config file -

- name: ConveRTTokenizer
  - name: ConveRTFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 20
  - name: EntitySynonymMapper
  - name: ResponseSelector

Tensorflow errors/warnings (I don’t have an NVIDIA GPU) -

W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory

W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory

W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Hi there @ganeshv, the tensorflow warnings are okay to ignore – we’re currently working on hiding them, but it requires tensorflow to remove them as they’re not surpressable from the rasa side unfortunately.

Can you try Rasa 1.8.2? You’ll almost definitely need >20 epochs for DIET, but the exiting of training is concerning. Can you post what you see when you run rasa train --debug? How much memory does your machine have?

Hello @erohmensing, thank you for your reply! I tried Rasa 1.8.2 with 100 epochs for the DIETClassifier and added the --debug flag. My training ended at 18% and this time my Python itself “quit unexpectedly”. My machine has 16 gigs of memory.

Hm, that’s quite strange. How large is your data set?

Hello @erohmensing, sorry I missed your reply. My dataset has about 12 intents containing about 35-40 examples max and about 15 examples on average.

Hello @erohmensing, I tried a couple of things for observation. I re-installed my OS (Catalina) and did a fresh install of Rasa.

I had four setups as described below. In each case, I had setup the DIET classifier for 100 epochs -

  1. Running rasa train directly on my cmd line (inside venv). This had rasa and rasa-sdk 1.10.0
  2. Running rasa train from inside a Docker container. This had rasa and rasa-sdk version 1.8.0.
  3. Setup 2 with rasa and rasa-sdk version 1.10.0.
  4. Running rasa train from inside a build in Jenkins. This also had rasa and rasa-sdk version 1.8.0.

The command was successfully completed in setups 1 and 4.

However, setup 2 and 3 continue to fail. For each of these setups, I don’t get a dialog that my Python quit unexpectedly, but the training just silently ends at somewhere between 9 - 19% of training the DIET classifier.

Could you advise on how I can troubleshoot this further?

@erohmensing - Found a way out for this one. Docker’s default setting allocated only 2GB. Which isn’t enough. Once I raised it to 10GB, DIET training eventually was completed.

Could you add this in the documentation for the pipeline as well? The silent failure even with the --debug flag was quite frustrating to troubleshoot.