r4sn4
(Rasna Tomar)
August 23, 2019, 3:58pm
1
Hi,
NLU training on my data is consuming a lot of time.
Details are as following -
Data size - 3.7 MB
RAM size - 32 GB
No. of cores - 12
Training time - 40 mins
NLU config is as follows -
language: en
pipeline:
- name: nlp_spacy
- name: tokenizer_spacy
- name: intent_entity_featurizer_regex
- name: ner_synonyms
- name: tokenizer_whitespace
- name: ner_crf
- name: intent_featurizer_count_vectors
analyzer: 'WORD'
min_ngram: 1 # int
max_ngram: 1 # int
- name: intent_classifier_tensorflow_embedding
epochs: 100
Can anybody tell me why is training taking a lot of time on such small data?
Thanks
r4sn4
(Rasna Tomar)
August 26, 2019, 3:46am
2
Does anybody have info about NLU training time?
That’s strange ! I think you should use proper pipeline configuration(config.yml).
My Data size - 16 MB stores, and 0.5MB max for nlu.md
RAM - 8GB
i7 CORS_2.8GHz - 8
Trainign time - ~30 min
NLU Config :
language: en
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: CountVectorsFeaturizer
token_pattern: (?u)\b\w+\b
- name: EmbeddingIntentClassifier
intent_tokenization_flag: true
intent_split_symbol: "+"
- name: DucklingHTTPExtractor
url: http://localhost:8000
dimensions:
- number
- location
- time
- name: EntitySynonymMapper
policies:
- batch_size: 50
epochs: 500
max_training_samples: 300
name: KerasPolicy
- max_history: 5
name: MemoizationPolicy
#- name: FallbackPolicy
# core_threshold: 0.3
# fallback_action_name: "action_restart"
- name: FormPolicy
- name: MappingPolicy
- name: TwoStageFallbackPolicy
nlu_threshold: 0.3
core_threshold: 0.3
fallback_core_action_name: "action_restart"
fallback_nlu_action_name: "utter_default"
deny_suggestion_intent_name: "out_of_scope"
r4sn4
(Rasna Tomar)
August 26, 2019, 8:57am
4
@shivangpatel . Thanks for the reply.
I will check on this pipeline and then revert back to you
r4sn4
(Rasna Tomar)
August 26, 2019, 11:23am
5
@shivangpatel on how many epochs you trained your nlu model?
I didn’t change. I was used default values. Use maximum default value initially for development.
Tips 1: After complete development and production (architecture + flow) ready, you need to worry about to tune hyper param for production model to get maximum accuracy.
Tips 2: One more thing, Add good and quality data to get good results/output.
r4sn4
(Rasna Tomar)
August 30, 2019, 7:06am
7
@shivangpatel
My rasa core training is not taking much time.
I asked for NLU training time only. The data size you mentioned for NLU is just 0.5 MB.
Wheres as I tested for NLU data of 3.7 MB.
Then, maybe It’s required that much time…
Try to tag any one senior level rasa developer to get response.
Anyway, How many Intent you required ? And how many examples you wrote for each intent !?
souvikg10
(Souvik Ghosh)
August 31, 2019, 6:31pm
9
Can you explain why do you need the spacy tokenizer and a white space tokenizer?
also which english model are you using sm, medium or large
as far as i remember, tensorflow embedding don’t require spacy embeddings, so perhaps you don’t need to load the spacy vectors(nlp_spacy)
r4sn4
(Rasna Tomar)
September 3, 2019, 5:39am
10
I am using only spacy tokeniser in the pipeline. I have edited above pipeline.
I have removed nlp_spacy from pipeline now as you mentioned tensorflow embedding does not require nlp_spacy.
How can we specify small or mdium or large english model in pipeline?
hey @r4sn4 , you can specify the model as mentioned the docs:
pipeline:
- name: "SpacyNLP"
# language model to load
model: "en_core_web_md"
1 Like
souvikg10
(Souvik Ghosh)
September 4, 2019, 3:26pm
12
your last question is only valid if you are using the spacy NLP as explained by Jitesh