What is the best recommendation for versioning rasa models? Currently I push the model directly to my git repository. But the problem is over time the git repository size increases with multiple model versions from different branches.
I found out that we can use DVC to version large files. It has also support to connect to aws s3. is there any resources / recommendations in model versioning best practices with respect to rasa projects?
I am looking forward to a solution where we can easily revert back the model in production in case we face any issues with the newly deployed model.
regarding models, treat them as artifacts, There is a metadata.json within the model tar which contains all information about the model, you can use that json as a meta to keep track.
Besides versioning in s3 is simply tracked by the model timestamp or any unique attribute you want to use. for us we have a unique training ID which i use as s3 object and place the model underneath that object. the path looks something like this - training-ID-agent-ID/language/model.tar.gz and i save the training data separately as well. just to bring a model back and verify on which data it has been trained.
Iād recommend synchronizing your local models folder with S3 at the end of the training session and using the model from S3 as a build artifact. You can use the below commands to do so with the AWS CLI in a script as part of your CI/CD (I have it set up as a step in my GitHub Actions, for example)
aws s3 sync ./models s3://rasa-artifacts/models --exclude .gitkeep
aws s3 sync ./logs s3://rasa-artifacts/logs --exclude .gitkeep # this is for testing logs
Although what @souvikg10 mentioned as a naming convention I think would also be a good idea, since you might have different types of models for different purposes. You may also want to consider naming the model something special during training with the --fixed-model-name CLI option during training time. A timestamp and the model id is a good place to start.