For OOM issues.
Yes, that could work, depending on what stage the OOM occurs. Itβs worth a try @nonola
Hi there!
With Rasa 3, without a lower batch_size still got OOM. With batch_size: [8, 32] and with this config.yml,
language: pt
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 30
learning_rate: 0.005
batch_size: [8, 32]
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 10
I got this result:
Epochs: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββ| 30/30 [1:34:37<00:00, 189.26s/it, t_loss=100, i_acc=0.98, e_f1=0.869, g_f1=0.944]
Why t_loss=100 ?
Thanks!
Just to confirm, are you getting the OOM issue with the DIET classifier or with TED?
With the DIET classifier.
I have some findings i did here. For me the big impact i have seen is the model load time upon startup⦠Since i run the model inference on k8s which quite often restarts the pod⦠the uptick in model load is a problem. I tested 2.8.9 and 3.0.x which has tensorflow 2.6 and seems like this is the problem.
I did not try instead of DIET. i can give that a go as well but for the training time increase wasnβt the real issue tbh
Part of me wonders β¦ did you add the UnexpecTEDIntentPolicy
in your pipeline? That component is now part of the rasa init
pipeline and may help explain the extra memory/startup time.
@koaning - my pipeline is purely an NLU one
But taking the suggestion of using CRF instead of DIET for entity recognition did bring things back to normal.
It might the tensorflow add-ons package which might be the culprit. I revisited your videos on DIET and you mentioned that the CRF in DIET is being fed the output of the transformer layer and built using tensorflow addons.
It looks like the memory explosion as well an increased load time for happens due to that and by removing entity recognition solves it
Youβre probably right that something about tensorflow is the culprit, since our CRFEntityExtractor
uses sklearn
. Our working theory is that the culprit is tf.scan
(see issue here)
Hi @joancipria, Did you solve the problem of increasing training time? I met the same question when moving from rasa 2.4.2 to 3.1.6.
Hi @fkoerner, I encountered the problem of increased training time when I upgraded rasa from 2.4.2 to 3.1.6.
Taking the officially provided formbot as an example, it only takes less than 1 minute to use gpu to train in rasa2.4, but it takes more than 2.5 minutes in rasa3.1, and I found that the gpu usage rate is very low. We are using an RTX 3090.
Is this normal? If not, is there any solution? Thanks!