Incremental Training [Experimental]

Emma · January 19, 2021, 4:45pm

We’re excited to announce the release of incremental training, a new experimental feature shipped in Rasa Open Source 2.2.0. This has been a frequently-requested feature by our community, and we’re excited for you to try it out!

Incremental training allows you to fine-tune an existing model after adding new training examples instead of training a new model from scratch. This cuts down training time significantly, reducing the time between annotating new data and testing its impact on model performance. Teams practicing conversation-driven development (CDD) benefit from greater efficiency, faster experimentation, and fewer bottlenecks.

For more details and information on How to Get Started with Incremental Training including code snippets, check out our latest blog post:

https://blog.rasa.com/rasa-new-incremental-training/

As it’s still an experimental feature, and we need your help testing and refining this feature based on your feedback. If you’ve tried incremental training, let us know how it performed on your data set. We’d particularly like to know how training times compared when fine-tuning vs training from scratch, as well as how the fine-tuned model performed. Let us know below!

Emma · January 19, 2021, 4:45pm

FAQs

Does incremental training replace training from scratch?

In short, incremental training supplements training from scratch, but it isn’t intended to replace it. Incremental training is a fast way to incorporate new data, but we recommend periodically training your production model from scratch to get the full benefit of longer training time.

It’s important to note that incremental training can only be performed when updates are limited to adding new NLU training examples to existing intents. Other changes, like adding or merging intents, creating a new entity, or adding slots and actions to the domain, require training from scratch.

Which model configuration settings do I need to consider?

Incremental training requires the base model to be the same shape and size as the fine-tuned model. Aside from epoch, you should keep your model configuration the same when creating your base model and performing incremental training.

In addition, there are a few extra hyperparameters you should add to your config.yml file if you use RegexFeaturizer or CountVectorsFeaturizer (follow the links for more details in the docs). These additional hyperparameters ensure that the sparse features are of a fixed size between the base and fine-tuned models.

How does the performance of a fine-tuned model compare to training from scratch?

Our early experiments have shown that the performance of a fine-tuned model is very close to the performance of a model trained from scratch. For teams who do frequent annotation and testing, a fine-tuned model provides an accurate representation of how a production model would perform with the new data.

To learn more about our benchmarking experiments and get a deep dive into the inner workings of incremental training, check out our latest Algorithm Whiteboard video.

Nasnl · January 19, 2021, 5:34pm

Great stuff! Really nice.

Question: is the model exactly the same in both versions of training? So if you train incrementally in a few steps, is the end result the same as when you would train all over again?

I assume it’s not considering the description about replacement, but just to make sure…

Could you indicate how the models differ?

dakshvar22 · January 20, 2021, 10:36am

@Nasnl The model architecture would be exactly the same, but the learnt model weights would differ. However, as the post mentions the performance of the two models should be comparable to each other.

Nasnl · January 20, 2021, 11:07am

Thanks for the explanation @dakshvar22!

AviPaul · January 25, 2021, 10:54am

Hi , I went through the incremental training white board at Rasa Algorithm Whiteboard - Incremental Training - YouTube They only speak about keeping some wiggle room for inputs (sparse features) but how are they handling the new classes/label on the output side. will they also keep some kind of empty spaces to accommodate new classes or the pertained models dose not consider the previous output layer?

lostinlogic · June 7, 2021, 5:10pm

Hi,

I have just tried this using rasa train nlu --finetune and found training time was almost the same as the standard training (I added one utterance to my NLU data). In fact it took 1 minute longer at 40 minutes (though I was using my laptop at the same time, so it’s not a very scientific experiment). Interestingly it outputted my full epoch value of 150 when training, rather than 20% of 150. It popped up with a message telling me it was using finetuning however.

Just posting this as feedback as I know it’s an experimental feature.

Pipeline:

- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
  lowercase: true
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 150
  random_seed: 43
- name: EntitySynonymMapper
- name: FallbackClassifier
  threshold: 0.7

System Info:

Rasa Version : 2.7.0
Minimum Compatible Version: 2.6.0
Rasa SDK Version : 2.7.0
Rasa X Version : None
Python Version : 3.8.10
Operating System : Linux-5.4.72-microsoft-standard-WSL2-x86_64-with-glibc2.29
Python Path : /home/marc/.pyenv/versions/3.8.10/bin/python3.8

kalpa916 · July 19, 2022, 2:20pm

do you find , how to use incremental training

Topic		Replies	Views
RasaX - Incremental Model Training [Deprecated] Rasa X Community Edition	3	631	May 7, 2021
Do I need to train after each change? Rasa Open Source	2	2480	January 26, 2021
Incremental training in jupyter-notebook Rasa Open Source	3	432	July 19, 2022
How to train Rasa faster within 1-1.5 min Rasa Open Source	2	955	July 19, 2022
Is it possible to do an incremental training on a model? Rasa Open Source	3	1052	July 19, 2022

Incremental Training [Experimental]

FAQs

Does incremental training replace training from scratch?

Which model configuration settings do I need to consider?

How does the performance of a fine-tuned model compare to training from scratch?

Related topics