Setting data augmentation to 0 - are there any specific disadvantages?

Hey, I have a question about the augmentation factor.

Problem

We currently have around 1000 stories but want to implement new stories that start with an utter. Running docker-compose run rasa data validate gives us zero conflicting stories. Unfortunately, rasa finds unpredictable actions in the glued stories right after processing the data augmentation blocks. This makes sense due to identical sequences in the augmented stories.

Possible solution

We can solve this problem by setting the data augmentation factor to zero.

Question

Before doing this, are there any specific disadvantages if we set the augmentation to 0? We couldn’t find a specific answer and a use case to this question in the form.

For us, it feels like a risky move since it is not easy to reset this change if we will add many stories that start with an utter. On the other hand, the memoization policy is working great and is not affected by augmentation and will automatically ignore all augmented stories.

Hi Jason, that’s a good question. While augmentation helps to glue stories together and train the model on more contextual conversations, the recommended approach is to test the dialogue model on test stories which are not part of the training data. So, you could set augmentation to 0 and then test the model on test stories for performance improvements/degradations. I would also recommend writing end to end tests(Testing Your Assistant) and consistently evaluate your assistant by setting up a CI/CD pipeline(Setting up CI/CD)