Rasa 1.10 dietclassifier cpu and speed issues

Hi everyone, we recently migrated from rasa 1.6 to rasa 1.10.12. We are experiencing cpu issues (requiring more than 4 cores) and training speed issues (over ~30 minutes to train). In many cases the training just hangs as well. Our 1.6 pipeline never had cpu issues and finished training our model in under 10 minutes. Could this be an issue with the diet classifier? We are doing all our training on a CPU and don’t have access to a GPU. I attached both or 1.6 pipeline and our 1.10 pipeline below. We also have a fairly large amount of training data with about ~4,000 unique intent examples and 384 stories.

Rasa 1.6 pipeline

language: "en"

pipeline:
- name: "SpacyNLP"
  model: "en_core_web_md"
  case_sensitive: false
- name: "WhitespaceTokenizer"
  case_sensitive: false
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "CountVectorsFeaturizer"
- name: "CountVectorsFeaturizer"
  analyzer: "char_wb"
  min_ngram: 1 
  max_ngram: 4
- name: "EmbeddingIntentClassifier"
  
policies:
  - name: "FormPolicy"
  - name: "KerasPolicy"
    epochs: 150
    featurizer:
      - name: MaxHistoryTrackerFeaturizer
        max_history: 4
        state_featurizer:
          - name: BinarySingleStateFeaturizer
  - name: "MemoizationPolicy"
    max_history: 4
  - name: "FallbackPolicy"
    nlu_threshold: 0.3
    core_threshold: 0.4
    ambiguity_threshold: 0.01
    fallback_action_name: 'action_custom_fallback'

Rasa 1.10.12 pipeline

language: "en"

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    hidden_layers_sizes:
      text: [256, 128]
    number_of_transformer_layers: 0
    weight_sparsity: 0
    intent_classification: True
    entity_recognition: False
    use_masked_language_model: False
    BILOU_flag: False
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100



policies:
  - name: "FormPolicy"
  - name: "KerasPolicy"
    epochs: 150
    featurizer:
      - name: MaxHistoryTrackerFeaturizer
        max_history: 4
        state_featurizer:
          - name: BinarySingleStateFeaturizer
  - name: "MemoizationPolicy"
    max_history: 4
  - name: "FallbackPolicy"
    nlu_threshold: 0.3
    core_threshold: 0.4
    ambiguity_threshold: 0.01
    fallback_action_name: 'action_custom_fallback'

Thank you for your help!