NLU training taking lot of time

Hi,

NLU training on my data is consuming a lot of time.

Details are as following -

Data size - 3.7 MB
RAM size - 32 GB
No. of cores - 12
Training time - 40 mins

NLU config is as follows -

language: en

pipeline:

- name: nlp_spacy
- name: tokenizer_spacy
- name: intent_entity_featurizer_regex
- name: ner_synonyms
- name: tokenizer_whitespace
- name: ner_crf
- name: intent_featurizer_count_vectors
     analyzer: 'WORD'
     min_ngram: 1  # int
     max_ngram: 1 # int
- name: intent_classifier_tensorflow_embedding
     epochs: 100

Can anybody tell me why is training taking a lot of time on such small data?

Thanks

Does anybody have info about NLU training time?

That’s strange ! I think you should use proper pipeline configuration(config.yml).

My Data size - 16 MB stores, and 0.5MB max for nlu.md

RAM - 8GB

i7 CORS_2.8GHz - 8

Trainign time -  ~30 min

NLU Config :

language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: CountVectorsFeaturizer
  token_pattern: (?u)\b\w+\b
- name: EmbeddingIntentClassifier
  intent_tokenization_flag: true
  intent_split_symbol: "+"
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
    - number
    - location
    - time
- name: EntitySynonymMapper

policies:
- batch_size: 50
  epochs: 500
  max_training_samples: 300
  name: KerasPolicy
- max_history: 5
  name: MemoizationPolicy
  #- name: FallbackPolicy
  #  core_threshold: 0.3
  #  fallback_action_name: "action_restart"
- name: FormPolicy
- name: MappingPolicy
- name: TwoStageFallbackPolicy
  nlu_threshold: 0.3
  core_threshold: 0.3
  fallback_core_action_name: "action_restart"
  fallback_nlu_action_name: "utter_default"
  deny_suggestion_intent_name: "out_of_scope"

@shivangpatel. Thanks for the reply. I will check on this pipeline and then revert back to you

@shivangpatel on how many epochs you trained your nlu model?

I didn’t change. I was used default values. Use maximum default value initially for development.

Tips 1: After complete development and production (architecture + flow) ready, you need to worry about to tune hyper param for production model to get maximum accuracy.

Tips 2: One more thing, Add good and quality data to get good results/output.

@shivangpatel My rasa core training is not taking much time. I asked for NLU training time only. The data size you mentioned for NLU is just 0.5 MB. Wheres as I tested for NLU data of 3.7 MB.

Then, maybe It’s required that much time…
Try to tag any one senior level rasa developer to get response.

Anyway, How many Intent you required ? And how many examples you wrote for each intent !?

Can you explain why do you need the spacy tokenizer and a white space tokenizer? also which english model are you using sm, medium or large

as far as i remember, tensorflow embedding don’t require spacy embeddings, so perhaps you don’t need to load the spacy vectors(nlp_spacy)

I am using only spacy tokeniser in the pipeline. I have edited above pipeline.

I have removed nlp_spacy from pipeline now as you mentioned tensorflow embedding does not require nlp_spacy.

How can we specify small or mdium or large english model in pipeline?

hey @r4sn4, you can specify the model as mentioned the docs:

pipeline:
- name: "SpacyNLP"
  # language model to load
  model: "en_core_web_md"
1 Like

your last question is only valid if you are using the spacy NLP as explained by Jitesh

Ok…