Rasa train hangs

Hi!

I am having a problem with rasa with the commands:

  • rasa train --num-threads -1

  • rasa test nlu --cross-validation

When running any of these commands the process randomly hangs. Sometimes it runs fine, sometines it hangs. I already checked and it is not a memory problem, but I haven’t found the issue yet.

Looking at htop I see that occupying around 100% of CPU however the memory is not full used:

/home/xxxx/.virtualenvs/implementation38/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-8 --pipe 30

Here is my data rasa_forum.tar.gz (7.4 KB)

Python version 3.8

Rasa version 2.8.2 (I also used 2.8.19 and I had the same behaviour)

Here is a print of logs:

Can someone help me with that? Thanks

@dsmendes can you please share rasa --version and your system configuration like CPU, RAM and HDD space? That’s the really interesting issue.

@dsmendes Can I ask why you required cross-validation are you comparing the NLU performance? Brief info about your use case, if you don’t mind to share. Thanks.

Thanks @nik202

RASA:

  • Rasa Version : 2.8.2
  • Minimum Compatible Version: 2.8.0
  • Rasa SDK Version : 2.8.2
  • Rasa X Version : None
  • Python Version : 3.8.0
  • Operating System : Linux-5.4.0-91-generic-x86_64-with-glibc2.27

SYSTEM

image

We use --cross-validation to compare NLU performace.

Sorry, I upload part of our NLU, the specific use case was not shared. But with the given NLU is possible to reproduce the behaviour.

@dsmendes you have 4GB RAM for you linux machine? Just confirming.

Overall I have 15 GB. When this happens I can confirm that it uses about 700 Mb.

@dsmendes are you able to train the model easily or it also show some errors or warnings messages? @dsmendes are you using customise pipelines in config.yml ?

I customized the suggested config. It is in the .tar in the post.

When it trains, it runs everything right, however when it hangs the logs are:

Training NLU model...
2022-01-06 09:48:11 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en_core_web_md'
2022-01-06 09:48:13 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_md'.
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 1706 (14 distinct intents)

2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'cost', 'deny', 'consumption_comparison', 'greet', 'cost_comparison', 'consumption', 'mood_great', 'goodbye', 'affirm', 'bot_challenge', 'nlu_fallback', 'tariff_comparison', 'mood_unhappy', 'tariff'
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 32 (1 distinct entities)
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  -   Found entity types: 'tariff_type'
2022-01-06 09:48:13 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component DucklingEntityExtractor
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits

@dsmendes please share your config.yml file as reference.

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
 # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
 # If you'd like to customize it, uncomment and adjust the pipeline.
 # See https://rasa.com/docs/rasa/tuning-your-model for more information.
   - name: SpacyNLP
     model: "en_core_web_md"
     case_sensitive: False
   - name: SpacyTokenizer
   - name: "RegexFeaturizer"
     "case_sensitive": False
     "use_word_boundaries": True
   - name: SpacyFeaturizer
   - name: DucklingEntityExtractor
     url: "http://duckling:8000"
     dimensions: [ "time", "duration"]
     locale: "en_GB"
     timezone: "Europe/London"
     timeout: 3
   - name: RegexEntityExtractor
     case_sensitive: False
     use_lookup_tables: True
     use_regexes: True
     "use_word_boundaries": True
   - name: CRFEntityExtractor
     "BILOU_flag": True
     "max_iterations": 50
     "L1_c": 0.1
     "L2_c": 0.1
     "featurizers": [ ]

   - name: EntitySynonymMapper
   - name: SklearnIntentClassifier
     C: [ 1, 2, 5, 10, 20, 100 ]
     kernels: [ "linear" ]
     "gamma": [ 0.1 ]
     "max_cross_validation_folds": 5
     "scoring_function": "f1_weighted"

   - name: FallbackClassifier
     threshold: 0.5

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
 # No configuration for policies was provided. The following default policies were used to train your model.
 # If you'd like to customize them, uncomment and adjust the policies.
 # See https://rasa.com/docs/rasa/policies for more information.
   - name: RulePolicy
     enable_fallback_prediction: true
     core_fallback_action_name: action_default_fallback
     core_fallback_threshold: 0.3

@dsmendes do you really require SVM hyperparameter Grid Search?

@dsmendes Confirm to me please that you getting any error message when it’s running SklearnIntentClassifier as I’m aware and with the experience of using hyperparameter with Grid Search, it takes a lot of time to get the best parameters selection for the same and even 5 KFold cross-validation is a lot for SVM

Fitting 2 folds for each of 6 candidates, totalling 12 fits

By this message, he is working and it’s on 2 folds so he needs to run 3 more folds, and then it will show the Finished training component.

To Cross-check try to mention only 2 Folds and check it’s giving you a Finished message or not?

Please do let me know. Hope this will help you.

With two folds I get the same.

The strange thing is that sometimes it finishes the process in few seconds. It is the reason that I don not understand the behaviour.

Please, see the logs bellow. In this run it worked fine.

Training NLU model...
2022-01-06 14:14:53 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en_core_web_md'
2022-01-06 14:14:54 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_md'.
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 938 (7 distinct intents)

2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'mood_unhappy', 'bot_challenge', 'goodbye', 'mood_great', 'deny', 'greet', 'affirm'
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 0 (0 distinct entities)
2022-01-06 14:14:54 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component DucklingEntityExtractor
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor

2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Starting to train component FallbackClassifier
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmpat9nhj6s/nlu'
NLU model training completed.
2022-01-06 14:14:58 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_md'.
Training Core model...
Processed story blocks: 100%|█████| 3/3 [00:00<00:00, 2818.75it/s, # trackers=1]
Processed story blocks: 100%|█████| 3/3 [00:00<00:00, 1303.79it/s, # trackers=3]
Processed story blocks: 100%|█████| 3/3 [00:00<00:00, 304.89it/s, # trackers=12]
Processed story blocks: 100%|██████| 3/3 [00:00<00:00, 89.93it/s, # trackers=39]
Processed rules: 100%|████████████| 4/4 [00:00<00:00, 3918.08it/s, # trackers=1]
Processed trackers: 100%|███████████| 4/4 [00:00<00:00, 4039.78it/s, # action=9]
Processed actions: 9it [00:00, 15847.50it/s, # examples=8]
Processed trackers: 100%|██████████| 3/3 [00:00<00:00, 2001.74it/s, # action=12]
Processed trackers: 100%|███████████████████████| 4/4 [00:00<00:00, 2895.12it/s]
Processed trackers: 100%|███████████████████████| 7/7 [00:00<00:00, 1912.21it/s]
2022-01-06 14:14:59 INFO     rasa.core.agent  - Persisted model to '/tmp/tmpat9nhj6s/core'
Core model training completed.
Your Rasa model is trained and saved at '/home/models/20220106-141501.tar.gz'.

@dsmendes Right, it’s a strange behaviour and I can see you have good amount of RAM for processing, try clear the cache of the system and delete the older trained model and re-train it again for 3 folds this time?

My reply can be delay as I’m facing technical issues on forum.

The behavior is maintained using 2, 3 or 5 folds :frowning_face:

@dsmendes did not get you on this?

Please always mention me @ and nik202 for fast response

@nik202 In the last message I forgot to say that I had the same behaviour after cleaning the cache and the models folder for the 2,3 and 5 folds.

@dsmendes this is really a strange one. Means 2 folds you were able to train, what about 3 and 5 ? 5fold we have a issue clearly we can see and 3?

@nik202 I am not able to train with 2 folds. The behaviour is the same using 2 or 3 or 5 folds. Sometimes it finishes sometimes it hangs. There is no pattern here.