Rasa train hangs

dsmendes · January 5, 2022, 4:10pm

Hi!

I am having a problem with rasa with the commands:

rasa train --num-threads -1
rasa test nlu --cross-validation

When running any of these commands the process randomly hangs. Sometimes it runs fine, sometines it hangs. I already checked and it is not a memory problem, but I haven’t found the issue yet.

Looking at htop I see that occupying around 100% of CPU however the memory is not full used:

/home/xxxx/.virtualenvs/implementation38/bin/python -m joblib.externals.loky.backend.popen_loky_posix --process-name LokyProcess-8 --pipe 30

Here is my data rasa_forum.tar.gz (7.4 KB)

Python version 3.8

Rasa version 2.8.2 (I also used 2.8.19 and I had the same behaviour)

Here is a print of logs:

Can someone help me with that? Thanks

nik202 · January 5, 2022, 5:53pm

@dsmendes can you please share rasa --version and your system configuration like CPU, RAM and HDD space? That’s the really interesting issue.

@dsmendes Can I ask why you required cross-validation are you comparing the NLU performance? Brief info about your use case, if you don’t mind to share. Thanks.

dsmendes · January 5, 2022, 6:10pm

Thanks @nik202

RASA:

Rasa Version : 2.8.2
Minimum Compatible Version: 2.8.0
Rasa SDK Version : 2.8.2
Rasa X Version : None
Python Version : 3.8.0
Operating System : Linux-5.4.0-91-generic-x86_64-with-glibc2.27

SYSTEM

We use --cross-validation to compare NLU performace.

Sorry, I upload part of our NLU, the specific use case was not shared. But with the given NLU is possible to reproduce the behaviour.

nik202 · January 5, 2022, 6:12pm

@dsmendes you have 4GB RAM for you linux machine? Just confirming.

dsmendes · January 5, 2022, 6:17pm

Overall I have 15 GB. When this happens I can confirm that it uses about 700 Mb.

nik202 · January 5, 2022, 6:49pm

@dsmendes are you able to train the model easily or it also show some errors or warnings messages? @dsmendes are you using customise pipelines in config.yml ?

dsmendes · January 6, 2022, 9:54am

I customized the suggested config. It is in the .tar in the post.

When it trains, it runs everything right, however when it hangs the logs are:

Training NLU model...
2022-01-06 09:48:11 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en_core_web_md'
2022-01-06 09:48:13 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_md'.
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 1706 (14 distinct intents)

2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'cost', 'deny', 'consumption_comparison', 'greet', 'cost_comparison', 'consumption', 'mood_great', 'goodbye', 'affirm', 'bot_challenge', 'nlu_fallback', 'tariff_comparison', 'mood_unhappy', 'tariff'
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 32 (1 distinct entities)
2022-01-06 09:48:13 INFO     rasa.shared.nlu.training_data.training_data  -   Found entity types: 'tariff_type'
2022-01-06 09:48:13 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:14 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component DucklingEntityExtractor
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 09:48:15 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits

nik202 · January 6, 2022, 12:28pm

@dsmendes please share your config.yml file as reference.

dsmendes · January 6, 2022, 12:57pm

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
 # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
 # If you'd like to customize it, uncomment and adjust the pipeline.
 # See https://rasa.com/docs/rasa/tuning-your-model for more information.
   - name: SpacyNLP
     model: "en_core_web_md"
     case_sensitive: False
   - name: SpacyTokenizer
   - name: "RegexFeaturizer"
     "case_sensitive": False
     "use_word_boundaries": True
   - name: SpacyFeaturizer
   - name: DucklingEntityExtractor
     url: "http://duckling:8000"
     dimensions: [ "time", "duration"]
     locale: "en_GB"
     timezone: "Europe/London"
     timeout: 3
   - name: RegexEntityExtractor
     case_sensitive: False
     use_lookup_tables: True
     use_regexes: True
     "use_word_boundaries": True
   - name: CRFEntityExtractor
     "BILOU_flag": True
     "max_iterations": 50
     "L1_c": 0.1
     "L2_c": 0.1
     "featurizers": [ ]

   - name: EntitySynonymMapper
   - name: SklearnIntentClassifier
     C: [ 1, 2, 5, 10, 20, 100 ]
     kernels: [ "linear" ]
     "gamma": [ 0.1 ]
     "max_cross_validation_folds": 5
     "scoring_function": "f1_weighted"

   - name: FallbackClassifier
     threshold: 0.5

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
 # No configuration for policies was provided. The following default policies were used to train your model.
 # If you'd like to customize them, uncomment and adjust the policies.
 # See https://rasa.com/docs/rasa/policies for more information.
   - name: RulePolicy
     enable_fallback_prediction: true
     core_fallback_action_name: action_default_fallback
     core_fallback_threshold: 0.3

nik202 · January 6, 2022, 1:42pm

@dsmendes do you really require SVM hyperparameter Grid Search?

@dsmendes Confirm to me please that you getting any error message when it’s running SklearnIntentClassifier as I’m aware and with the experience of using hyperparameter with Grid Search, it takes a lot of time to get the best parameters selection for the same and even 5 KFold cross-validation is a lot for SVM

Fitting 2 folds for each of 6 candidates, totalling 12 fits

By this message, he is working and it’s on 2 folds so he needs to run 3 more folds, and then it will show the Finished training component.

To Cross-check try to mention only 2 Folds and check it’s giving you a Finished message or not?

Please do let me know. Hope this will help you.

dsmendes · January 6, 2022, 2:19pm

With two folds I get the same.

The strange thing is that sometimes it finishes the process in few seconds. It is the reason that I don not understand the behaviour.

Please, see the logs bellow. In this run it worked fine.

Training NLU model...
2022-01-06 14:14:53 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en_core_web_md'
2022-01-06 14:14:54 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_md'.
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Training data stats:
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Number of intent examples: 938 (7 distinct intents)

2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  -   Found intents: 'mood_unhappy', 'bot_challenge', 'goodbye', 'mood_great', 'deny', 'greet', 'affirm'
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2022-01-06 14:14:54 INFO     rasa.shared.nlu.training_data.training_data  - Number of entity examples: 0 (0 distinct entities)
2022-01-06 14:14:54 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component DucklingEntityExtractor
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component RegexEntityExtractor

2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component CRFEntityExtractor
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:55 INFO     rasa.nlu.model  - Starting to train component SklearnIntentClassifier
Fitting 2 folds for each of 6 candidates, totalling 12 fits
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Starting to train component FallbackClassifier
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Finished training component.
2022-01-06 14:14:56 INFO     rasa.nlu.model  - Successfully saved model into '/tmp/tmpat9nhj6s/nlu'
NLU model training completed.
2022-01-06 14:14:58 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en_core_web_md'.
Training Core model...
Processed story blocks: 100%|█████| 3/3 [00:00<00:00, 2818.75it/s, # trackers=1]
Processed story blocks: 100%|█████| 3/3 [00:00<00:00, 1303.79it/s, # trackers=3]
Processed story blocks: 100%|█████| 3/3 [00:00<00:00, 304.89it/s, # trackers=12]
Processed story blocks: 100%|██████| 3/3 [00:00<00:00, 89.93it/s, # trackers=39]
Processed rules: 100%|████████████| 4/4 [00:00<00:00, 3918.08it/s, # trackers=1]
Processed trackers: 100%|███████████| 4/4 [00:00<00:00, 4039.78it/s, # action=9]
Processed actions: 9it [00:00, 15847.50it/s, # examples=8]
Processed trackers: 100%|██████████| 3/3 [00:00<00:00, 2001.74it/s, # action=12]
Processed trackers: 100%|███████████████████████| 4/4 [00:00<00:00, 2895.12it/s]
Processed trackers: 100%|███████████████████████| 7/7 [00:00<00:00, 1912.21it/s]
2022-01-06 14:14:59 INFO     rasa.core.agent  - Persisted model to '/tmp/tmpat9nhj6s/core'
Core model training completed.
Your Rasa model is trained and saved at '/home/models/20220106-141501.tar.gz'.

nik202 · January 6, 2022, 2:23pm

@dsmendes Right, it’s a strange behaviour and I can see you have good amount of RAM for processing, try clear the cache of the system and delete the older trained model and re-train it again for 3 folds this time?

My reply can be delay as I’m facing technical issues on forum.

dsmendes · January 6, 2022, 2:51pm

The behavior is maintained using 2, 3 or 5 folds

nik202 · January 6, 2022, 3:17pm

@dsmendes did not get you on this?

Please always mention me @ and nik202 for fast response

dsmendes · January 7, 2022, 9:20am

@nik202 In the last message I forgot to say that I had the same behaviour after cleaning the cache and the models folder for the 2,3 and 5 folds.

nik202 · January 7, 2022, 1:50pm

@dsmendes this is really a strange one. Means 2 folds you were able to train, what about 3 and 5 ? 5fold we have a issue clearly we can see and 3?

dsmendes · January 7, 2022, 2:29pm

@nik202 I am not able to train with 2 folds. The behaviour is the same using 2 or 3 or 5 folds. Sometimes it finishes sometimes it hangs. There is no pattern here.

Topic		Replies	Views
Rasa train taking lot of time Rasa Open Source	22	5009	July 6, 2021
Rasa Train Error Function call stack: train_on_batch Rasa Open Source	20	2447	October 7, 2021
Training always aborts with killed Rasa Open Source	18	3389	July 17, 2019
[ASK] Process get killed when training RASA core Rasa Open Source	22	2594	October 6, 2023
Training seems to finish properly, but there is no new model after 2 hours [Deprecated] Rasa X Community Edition	8	797	November 4, 2020

Rasa train hangs

Related topics