Training killed when using DIET config

Hi, really nice to see the new DIET classifier, good job rasa team.

I got a probelm when training DIET, 58%20PM

My process was killed, because of extreme resource starvation, even if I only tried to use light config: language: en pipeline:

  • name: ConveRTTokenizer
  • name: ConveRTFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 20 learning_rate: 0.005 num_transformer_layers: 0 embedding_dimension: 10 weight_sparcity: 0.90 hidden_layer_sizes: text: [256, 128] policies:
  • name: EmbeddingPolicy max_history: 10 epochs: 20 batch_size:
    • 32
    • 64
  • max_history: 6 name: AugmentedMemoizationPolicy
  • core_threshold: 0.3 name: TwoStageFallbackPolicy nlu_threshold: 0.8
  • name: FormPolicy
  • name: MappingPolicy

My question is: is there a hardware requirement for training DIET? I am using GCP 4xCPU 26GB compute engine, but seems not enough……

1 Like

what is the size of your training data?

config you provided, doesn’t correspond to the log

Thanks for your response: my nlu.md file is 314,836 bytes (373 KB on disk) sorry, I pasted a wrong one, my config is:

  • name: ConveRTTokenizer
  • name: ConveRTFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
    • name: RegexFeaturizer
    • name: LexicalSyntacticFeaturizer
    • name: EntitySynonymMapper
    • name: DIETClassifier intent_classification: Ture entity_recognition: True use_masked_language_model: False number_of_transformer_layers: 0

policies:

  • name: EmbeddingPolicy max_history: 10 epochs: 20 batch_size:
    • 32
    • 64
  • max_history: 6 name: AugmentedMemoizationPolicy
  • core_threshold: 0.3 name: TwoStageFallbackPolicy nlu_threshold: 0.8
  • name: FormPolicy
  • name: MappingPolicy

Or should I splite intent classification and entity extraction by using 2 DIET classifier component?

try removing name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4 and see whether it’ll start working

Thanks for your suggestion, I removed both countvectorfeaturizer and unfortunately it still not working: 24%20AM

is there a way to find out why the process was killed?

Yes, I was monitoring the CPU and memory usage during training, I think that’s because cpu usage is too high

I have the same problem, have you solved it?