Is there a recommended way of managing large model files?

ganeshv · July 31, 2020, 12:08pm

Currently, my team uses a Github repository to manage changes to the training examples, actions, config etc. which is then hooked up to an instance of Rasa X through the integrated version control feature. My concern now is that the model files are beginning to get huge (we’re only touching the tip of the iceberg in terms of training examples and the model is at 150-200MB). I fully expect it to hit the gigabyte threshold in a couple of months.

Is there a recommended way of handling large model files? How does this affect Rasa X and integrated version control? Do you need to commit the model as well for the integrated version to work correctly? Does the model file become part of the PR after reviewing examples inside Rasa X?

Right now, we’re thinking that training the model will be part of our build process (so that we avoid managing it in Github altogether) and then generate assets which would be deployed to the right machines for use.

Am I thinking in the right direction or missing something? Many thanks in advance!

stephens · August 5, 2020, 12:59am

We added the --remote-storage option to the rasa run so that you could store models in S3, GCS or Azure. You’ll find more info on this here.

You should not use git to store models.

ganeshv · August 5, 2020, 10:01am

Many thanks @stephens! I’ll explore this option. On paper, it looks like it has everything I need.

Topic		Replies	Views
Model data isn't a part of RASA X Version Control [Deprecated] Rasa X Community Edition	1	346	August 4, 2020
Rasa model versioning in s3? Rasa Open Source	2	454	July 27, 2021
Rasa X model training [Deprecated] Rasa X Community Edition	6	797	May 19, 2021
Cannot train nor upload model [Deprecated] Rasa X Community Edition	1	285	March 2, 2022
Doubt regarding architecture of Rasa Deployment Rasa Open Source	3	1356	November 5, 2020

Is there a recommended way of managing large model files?

Related topics