What is the best recommendation for versioning rasa models? Currently I push the model directly to my git repository. But the problem is over time the git repository size increases with multiple model versions from different branches.

I found out that we can use DVC to version large files. It has also support to connect to aws s3. is there any resources / recommendations in model versioning best practices with respect to rasa projects?

I am looking forward to a solution where we can easily revert back the model in production in case we face any issues with the newly deployed model.

regarding models, treat them as artifacts, There is a metadata.json within the model tar which contains all information about the model, you can use that json as a meta to keep track.

Besides versioning in s3 is simply tracked by the model timestamp or any unique attribute you want to use. for us we have a unique training ID which i use as s3 object and place the model underneath that object. the path looks something like this - training-ID-agent-ID/language/model.tar.gz and i save the training data separately as well. just to bring a model back and verify on which data it has been trained.


Iā€™d recommend synchronizing your local models folder with S3 at the end of the training session and using the model from S3 as a build artifact. You can use the below commands to do so with the AWS CLI in a script as part of your CI/CD (I have it set up as a step in my GitHub Actions, for example)

aws s3 sync ./models s3://rasa-artifacts/models --exclude .gitkeep
aws s3 sync ./logs s3://rasa-artifacts/logs --exclude .gitkeep # this is for testing logs

Although what @souvikg10 mentioned as a naming convention I think would also be a good idea, since you might have different types of models for different purposes. You may also want to consider naming the model something special during training with the --fixed-model-name CLI option during training time. A timestamp and the model id is a good place to start.

