CI/CD pipeline for testing app changes efficiently

We have a staging rasa pipeline using GitLab-ci. It currently builds our app image, deploys rasa to our staging k8s cluster using that custom image as the app server, and cleans the environment whenever we change the actions (meaning a new action would trigger cleanup and deployment automatically).

It works great but it has a major downfall. The cleanup step is basically a helm uninstall. That means that for testing actions we have to retrain the bot every single time (since helm uninstall deletes the PV). This obviously limits the amount of debugging we can get done in a day.

We’ve thought of two solutions for this.

First one:

Modify the ci in a way that making changes to the actions dir, triggers an image build and push and then redeploy only the app server using the new image.

Second one:

Modify the ci (and maybe helm chart) and change the RetainPolicy of the PersistentVolumes to RETAIN instead of DELETE. Also using a simple if statement to check for existing volumes in the namespace and if there are none do a full deploy and if there are some, mount them for the rasax pod.

What are your thought on the two options? Do you have a similar experience?

This is how our project structure looks like

.
├── actions
│   ├── actions
│   │   ├── actions.py
│   │   ├── __init__.py
│   │   └── requirements.txt
│   └── Dockerfile
├── config.yml
├── data
│   ├── nlu.yml
│   ├── rules.yml
│   └── stories.yml
├── domain.yml
├── extra-values.yaml
├── __pycache__
│   ├── ...
├── rabbitmq
│   └── values.yml
├── README.md
├── results
│   ├──...
└── tests
    └── conversation_tests.md

We appreciate any input that you may have

To add a bit to the original post.

I believe the first option to be the cleanest one and also the easiest to implement. However the method for the implementation is still unclear. I was think a helm update but haven’t got to test on our environment.

The cleanup step is basically a helm uninstall.

Why isn’t this a helm upgrade where the only change is the image version id? As you state in your First option, this makes sense but you could still need a new model depending on the changes.

That means that for testing actions we have to retrain the bot every single time

Because you’ve created new actions? If the training data hasn’t changed, then you don’t need to retrain. If there’s a new action then presumably there are new stories and an update to the domain.yml so you would need to re-train.

You should consider using our helm chart. This includes Rasa X which you may not be using. But, an advantage of Rasa X that isn’t explicitly pointed out very often is that it includes a model server for your Rasa OSS instances.

At the end of your CI/CD process, you would push an updated model to Rasa X using curl and the Rasa X API. You can mark that model as active and Rasa X will push the new model to your Rasa OSS instances. This is really useful if you start to scale the number of Rasa OSS pod instances over time because all of the pods will get the new model. No PV’s.

Thanks for the great input.

We settled for modifying out GitLabCI to run a docker build (for app server) and upgrade (helm upgrade with the new tag) always and reserved a special tag for running the full pipeline ( build > clean > deploy > configure).

We were using your helm chart. Training is currently done locally but we plan on introducing AWS training and then uploading the model to RasaX in the future.