Rasa Docker 0.29.3 - Train Model only creating 40 % CPU load?

Hi, when Training my model via click on “Train”, i only see about 40-50 % cpu load on the machine via nmon, on Intel® Xeon® CPU E5-2680 v3 @ 2.50GHz with 8 logical cores.

Is there something i could do, to make it use all available cpu power and finish faster?

Any options on how manye cores can be used and at which utilization?

On a sidenote, i updated to 29.3 to check this, was using .28.x before. RASA_HOME variable is not respected on 0.29.x versions currently, install always ends up in /etc/rasa.

Thanks

This is definately me thinking out loud, but how big is your training data? If the dataset isn’t huge, it could just be amdahls law.

The Dataset is 45 Intents with about 2100 NLU examples. There was no improvement going from 8 to 24 cores, except the the utilization per core is lower. So it looks like number of threads is limited?

The pipeline used is:

language: de pipeline:

  • name: SpacyNLP case_sensitive: false
  • name: SpacyTokenizer
  • name: SpacyFeaturizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 100 policies:
  • name: MemoizationPolicy
  • name: TEDPolicy max_history: 5 epochs: 100
  • name: MappingPolicy
  • name: FallbackPolicy nlu_threshold: 0.3 ambiguity_threshold: 0.1 core_threshold: 0.3 fallback_action_name: utter_anderes_problem_de

Now with almost 6800 NLU inputs, training time is even worse. Far over 40 minutes … and still not all cpu power is used, is there any way to use all thos cores (24) ? Memory on the system is 50 GB, which should be plenty.

Did you try config tensorflow? TensorFlow Configuration

Unfortunately the link is dead by now. But should be this one now: TensorFlow Configuration

I set ENV_CPU_INTER_OP_CONFIG=24 ENV_CPU_INTRA_OP_CONFIG=24 but it did not change the behaviour (e.g. training speed), even though it gets picked up. Checked this by assigning an incorrect value to those environment variables, which pops error lines into the log.

Docs state, that the default value is 0 and does not limit threads, which somewhat makes sense, since performance did not change. I is still strange that the cores have so much idle time, i would expect a completely utilized system.