Rasa 1.10.14 DIETClassifier takes a very long time and it's not using GPU

HI, I’m using for an important client a data model (58 intents, an average of 115 utterances per intent) on Rasa Open Source (NLU only) 1.10.14, but it’s taking more than 2 hours to train and it seems not using GPU. The VM used is very powerful: 2 x Tesla V100, 12 vCPU and 224 GB of RAM. Here the config file:

Configuration for Rasa NLU.


language: “en”


  • name: HFTransformersNLP model_weights: roberta-large-mnli model_name: “roberta”
  • name: LanguageModelTokenizer
  • name: LanguageModelFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
  • name: CountVectorsFeaturizer
  • name: DIETClassifier epochs: 300 num_transformer_layers: 4 transformer_size: 256 use_masked_language_model: True drop_rate: 0.25 weight_sparsity: 0.7 batch_size: [64,256] embedding_dimension: 30 hidden_layer_sizes: text: [512,128]

Configuration for Rasa Core.


Ted policy


  • name: TEDPolicy max_history: 5 epochs: 100 batch_size: 50
  • name: MemoizationPolicy
  • name: MappingPolicy
  • name: FallbackPolicy nlu_threshold: 0.3 core_threshold: 0.3 ambiguity_threshold: 0.02

Here the length while training with 400 epochs (but even with 300 it’s almost the same time) Epochs: 20%|████▉ | 82/400 [29:34<1:47:52, 20.36s/it, t_loss=4.797, m_loss=0.757, i_loss=0.127, entity_loss=0.235, m_acc=0.914, i_acc=0.981, entity_f1=0.937]

Here the Nvidia-smi log:

Here a screenshot of the training phase with expected time:

Any suggestion about how to reduce the training time (without changing training data and config, it’s a client one)? And any suggestion about the GPU not being used? I’m happy to provide further details once requested, Thanks a lot! Paolo