NLU training taking lot of time

r4sn4 · August 23, 2019, 3:58pm

Hi,

NLU training on my data is consuming a lot of time.

Details are as following -

Data size - 3.7 MB
RAM size - 32 GB
No. of cores - 12
Training time - 40 mins

NLU config is as follows -

language: en

pipeline:

- name: nlp_spacy
- name: tokenizer_spacy
- name: intent_entity_featurizer_regex
- name: ner_synonyms
- name: tokenizer_whitespace
- name: ner_crf
- name: intent_featurizer_count_vectors
     analyzer: 'WORD'
     min_ngram: 1  # int
     max_ngram: 1 # int
- name: intent_classifier_tensorflow_embedding
     epochs: 100

Can anybody tell me why is training taking a lot of time on such small data?

Thanks

r4sn4 · August 26, 2019, 3:46am

Does anybody have info about NLU training time?

shivangpatel · August 26, 2019, 5:23am

That’s strange ! I think you should use proper pipeline configuration(config.yml).

My Data size - 16 MB stores, and 0.5MB max for nlu.md

RAM - 8GB

i7 CORS_2.8GHz - 8

Trainign time -  ~30 min

NLU Config :

language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: CountVectorsFeaturizer
  token_pattern: (?u)\b\w+\b
- name: EmbeddingIntentClassifier
  intent_tokenization_flag: true
  intent_split_symbol: "+"
- name: DucklingHTTPExtractor
  url: http://localhost:8000
  dimensions:
    - number
    - location
    - time
- name: EntitySynonymMapper

policies:
- batch_size: 50
  epochs: 500
  max_training_samples: 300
  name: KerasPolicy
- max_history: 5
  name: MemoizationPolicy
  #- name: FallbackPolicy
  #  core_threshold: 0.3
  #  fallback_action_name: "action_restart"
- name: FormPolicy
- name: MappingPolicy
- name: TwoStageFallbackPolicy
  nlu_threshold: 0.3
  core_threshold: 0.3
  fallback_core_action_name: "action_restart"
  fallback_nlu_action_name: "utter_default"
  deny_suggestion_intent_name: "out_of_scope"

r4sn4 · August 26, 2019, 8:57am

@shivangpatel. Thanks for the reply. I will check on this pipeline and then revert back to you

r4sn4 · August 26, 2019, 11:23am

@shivangpatel on how many epochs you trained your nlu model?

shivangpatel · August 28, 2019, 8:04am

I didn’t change. I was used default values. Use maximum default value initially for development.

Tips 1: After complete development and production (architecture + flow) ready, you need to worry about to tune hyper param for production model to get maximum accuracy.

Tips 2: One more thing, Add good and quality data to get good results/output.

r4sn4 · August 30, 2019, 7:06am

@shivangpatel My rasa core training is not taking much time. I asked for NLU training time only. The data size you mentioned for NLU is just 0.5 MB. Wheres as I tested for NLU data of 3.7 MB.

shivangpatel · August 31, 2019, 12:13pm

Then, maybe It’s required that much time…
Try to tag any one senior level rasa developer to get response.

Anyway, How many Intent you required ? And how many examples you wrote for each intent !?

souvikg10 · August 31, 2019, 6:31pm

Can you explain why do you need the spacy tokenizer and a white space tokenizer? also which english model are you using sm, medium or large

as far as i remember, tensorflow embedding don’t require spacy embeddings, so perhaps you don’t need to load the spacy vectors(nlp_spacy)

r4sn4 · September 3, 2019, 5:39am

I am using only spacy tokeniser in the pipeline. I have edited above pipeline.

I have removed nlp_spacy from pipeline now as you mentioned tensorflow embedding does not require nlp_spacy.

How can we specify small or mdium or large english model in pipeline?

JiteshGaikwad · September 3, 2019, 8:36am

hey @r4sn4, you can specify the model as mentioned the docs:

pipeline:
- name: "SpacyNLP"
  # language model to load
  model: "en_core_web_md"

souvikg10 · September 4, 2019, 3:26pm

your last question is only valid if you are using the spacy NLP as explained by Jitesh

r4sn4 · September 6, 2019, 5:56am

Ok…

Topic		Replies	Views
NLU training takes a long time Rasa Open Source	2	823	March 22, 2021
Training Rasa NLU model on AWS EC2 p2.xlarge Instance Rasa Open Source	10	996	November 18, 2020
Rasa Model taking alot of time to train Rasa Open Source	9	2559	June 11, 2020
Rasa nlu train with a large dataset is stuck Rasa Open Source	20	2204	April 8, 2020
Can any one explain how to reduce nlu training time Rasa Open Source	4	964	July 5, 2022

NLU training taking lot of time

Related topics