Rasa model versioning in s3?

dingusagar · July 2, 2021, 7:32am

What is the best recommendation for versioning rasa models? Currently I push the model directly to my git repository. But the problem is over time the git repository size increases with multiple model versions from different branches.

I found out that we can use DVC to version large files. It has also support to connect to aws s3. is there any resources / recommendations in model versioning best practices with respect to rasa projects?

I am looking forward to a solution where we can easily revert back the model in production in case we face any issues with the newly deployed model.

souvikg10 · July 9, 2021, 9:48am

regarding models, treat them as artifacts, There is a metadata.json within the model tar which contains all information about the model, you can use that json as a meta to keep track.

Besides versioning in s3 is simply tracked by the model timestamp or any unique attribute you want to use. for us we have a unique training ID which i use as s3 object and place the model underneath that object. the path looks something like this - training-ID-agent-ID/language/model.tar.gz and i save the training data separately as well. just to bring a model back and verify on which data it has been trained.

niveK · July 27, 2021, 5:55pm

I’d recommend synchronizing your local models folder with S3 at the end of the training session and using the model from S3 as a build artifact. You can use the below commands to do so with the AWS CLI in a script as part of your CI/CD (I have it set up as a step in my GitHub Actions, for example)

aws s3 sync ./models s3://rasa-artifacts/models --exclude .gitkeep
aws s3 sync ./logs s3://rasa-artifacts/logs --exclude .gitkeep # this is for testing logs

Although what @souvikg10 mentioned as a naming convention I think would also be a good idea, since you might have different types of models for different purposes. You may also want to consider naming the model something special during training with the --fixed-model-name CLI option during training time. A timestamp and the model id is a good place to start.

Topic		Replies	Views
Is there a recommended way of managing large model files? Rasa Open Source	2	823	August 5, 2020
AWS S3 complication Rasa Open Source	4	524	July 26, 2020
How can I store models to AWS S3 (RASA X) [Deprecated] Rasa X Community Edition	0	287	October 31, 2020
AWS upload to S3 after training? Rasa Open Source	4	1074	March 12, 2021
Rasa dataset versioning in CI/CD Rasa Open Source	0	8	August 29, 2024

Rasa model versioning in s3?

Related topics