Feedback: Upgrading to Tensorflow 2.6

joancipria · November 24, 2021, 2:45pm

For OOM issues.

fkoerner · November 24, 2021, 3:01pm

Yes, that could work, depending on what stage the OOM occurs. It’s worth a try @nonola

nonola · November 27, 2021, 11:07pm

Hi there!

With Rasa 3, without a lower batch_size still got OOM. With batch_size: [8, 32] and with this config.yml,

language: pt

pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 30
  learning_rate: 0.005
  batch_size: [8, 32]
  constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100
- name: FallbackClassifier
  threshold: 0.3
  ambiguity_threshold: 0.1

policies:
- name: MemoizationPolicy
- name: TEDPolicy
  max_history: 5
  epochs: 10

I got this result:

Epochs: 100%|█████████████████████████████████████████████████| 30/30 [1:34:37<00:00, 189.26s/it, t_loss=100, i_acc=0.98, e_f1=0.869, g_f1=0.944]

Why t_loss=100 ?

Thanks!

koaning · November 29, 2021, 1:33pm

Just to confirm, are you getting the OOM issue with the DIET classifier or with TED?

nonola · November 29, 2021, 2:05pm

With the DIET classifier.

souvikg10 · January 24, 2022, 2:41pm

I have some findings i did here. For me the big impact i have seen is the model load time upon startup… Since i run the model inference on k8s which quite often restarts the pod… the uptick in model load is a problem. I tested 2.8.9 and 3.0.x which has tensorflow 2.6 and seems like this is the problem.

I did not try instead of DIET. i can give that a go as well but for the training time increase wasn’t the real issue tbh

github.com/RasaHQ/rasa

Rasa 3.x using significantly more memory

opened 10:04AM - 24 Jan 22 UTC

souvikg10

type:bug

area:rasa-oss

### Rasa Open Source version 3.0.4 ### Rasa SDK version _No response_ ### Ra…sa X version _No response_ ### Python version 3.8 ### What operating system are you using? Linux ### What happened? So i did a little memory profiler to compare the difference in memory usage in Rasa 3.0 compared to my previous version of Rasa 2.6.x since my K8s pods suddenly started issuing OOM kills in Rasa 3.0 after loading the model I compared only loading the NLU models as that's what i was concered with and used the corresponding Rasa API in each of them to profile it @tmbo - let me know your thoughts on it.. I mean the difference is quite significant in inference(almost double), also am i using the right methods to profile it? ``` RASA 3.0 Line # Mem usage Increment Occurrences Line Contents ============================================================= 14 406.7 MiB 406.7 MiB 1 @profile 15 def get_interpreter(self, model_path): 16 406.7 MiB 0.0 MiB 1 try: 17 406.7 MiB 0.0 MiB 1 model = get_validated_path(model_path, "model", DEFAULT_MODELS_PATH) 18 1003.0 MiB 596.4 MiB 1 agent = Agent.load(model_path) 19 1003.0 MiB 0.0 MiB 1 return agent RASA 2.6.1 15 332.1 MiB 332.1 MiB 1 @profile 16 def get_interpreter(self, model_path): 17 332.1 MiB 0.0 MiB 1 try: 18 332.1 MiB 0.0 MiB 1 model = get_validated_path(model_path, "model", DEFAULT_MODELS_PATH) 19 332.1 MiB 0.0 MiB 1 model_path = get_model(model) 20 332.1 MiB 0.0 MiB 1 _, nlu_model = get_model_subdirectories(model_path) 21 22 653.3 MiB 321.2 MiB 1 interpreter = Interpreter.load(nlu_model) 23 24 653.3 MiB 0.0 MiB 1 return interpreter ``` ### Command / Request _No response_ ### Relevant log output _No response_

koaning · January 28, 2022, 9:43am

Part of me wonders … did you add the UnexpecTEDIntentPolicy in your pipeline? That component is now part of the rasa init pipeline and may help explain the extra memory/startup time.

souvikg10 · January 28, 2022, 4:49pm

@koaning - my pipeline is purely an NLU one

But taking the suggestion of using CRF instead of DIET for entity recognition did bring things back to normal.

It might the tensorflow add-ons package which might be the culprit. I revisited your videos on DIET and you mentioned that the CRF in DIET is being fed the output of the transformer layer and built using tensorflow addons.

It looks like the memory explosion as well an increased load time for happens due to that and by removing entity recognition solves it

fkoerner · February 9, 2022, 7:25am

You’re probably right that something about tensorflow is the culprit, since our CRFEntityExtractor uses sklearn. Our working theory is that the culprit is tf.scan (see issue here)

siruizeng011 · September 19, 2022, 2:26am

Hi @joancipria, Did you solve the problem of increasing training time? I met the same question when moving from rasa 2.4.2 to 3.1.6.

siruizeng011 · September 19, 2022, 3:25am

Hi @fkoerner, I encountered the problem of increased training time when I upgraded rasa from 2.4.2 to 3.1.6.

Taking the officially provided formbot as an example, it only takes less than 1 minute to use gpu to train in rasa2.4, but it takes more than 2.5 minutes in rasa3.1, and I found that the gpu usage rate is very low. We are using an RTX 3090.

Is this normal? If not, is there any solution? Thanks!

Topic		Replies	Views
Issue regarding Fastening the model training with DIET classifier Rasa Open Source	1	652	November 6, 2020
Rasa train (rasa 1.9.x \| TensorFlow 2) on GPU? Rasa Open Source	6	4011	July 6, 2022
Difficulties using the new recommended pipeline Rasa Open Source	6	825	May 13, 2020
Rasa 1.10 dietclassifier cpu and speed issues Rasa Open Source	1	743	October 26, 2020
CRFEntityExtractor how much time take to complete Welcome to the Rasa Community Forum!	5	875	December 31, 2019

Feedback: Upgrading to Tensorflow 2.6

Related topics