Is there a recommended way of managing large model files?

Currently, my team uses a Github repository to manage changes to the training examples, actions, config etc. which is then hooked up to an instance of Rasa X through the integrated version control feature. My concern now is that the model files are beginning to get huge (we’re only touching the tip of the iceberg in terms of training examples and the model is at 150-200MB). I fully expect it to hit the gigabyte threshold in a couple of months.

Is there a recommended way of handling large model files? How does this affect Rasa X and integrated version control? Do you need to commit the model as well for the integrated version to work correctly? Does the model file become part of the PR after reviewing examples inside Rasa X?

Right now, we’re thinking that training the model will be part of our build process (so that we avoid managing it in Github altogether) and then generate assets which would be deployed to the right machines for use.

Am I thinking in the right direction or missing something? Many thanks in advance!

1 Like

We added the --remote-storage option to the rasa run so that you could store models in S3, GCS or Azure. You’ll find more info on this here.

You should not use git to store models.

1 Like

Many thanks @stephens! I’ll explore this option. On paper, it looks like it has everything I need.