Incredibly Slow NLU Training of Dialogflow Exported Intents

I have about 800 intents, each with around 20 training phrases on average, exported from Dialogflow.

I used the rasa tools to convert that into a rasa file.

This is only NLU training.

I am attempting to now train this model. I have tested the following GCP virtual machine configurations using Debian:

224 vCPUs + 224 GB Ram => ~26 hours of training.

96 vCPUs + 86 GB Ram + 8 GPU Nvidia Tesla [with installed tensorflow-gpu] => ~38 hours of training.


Obviously this is not sustainable, especially since we have to train our model daily. I mean, even the most expensive and most powerful Google VM cannot handle this, yet the same exact dataset takes less than 1 hr to train in Dialogflow.


I also tested removing all entities and synonyms - it made no difference in the training phase. I am using the default configuration yml values… aka 100 epochs, etc, etc.


So, I would love to get some input, insights, ideas, whatever… how do I get Rasa to train this set? And no, more RAM, GPU and CPU is NOT THE ANSWER :slight_smile:

Thank you

What pipeline are you using? Can you paste your config.yml.

I am using the config created by rasa:

language: en
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1