Setting data augmentation to 0 - are there any specific disadvantages?

jason · October 14, 2020, 10:18am

Hey, I have a question about the augmentation factor.

Problem

We currently have around 1000 stories but want to implement new stories that start with an utter. Running docker-compose run rasa data validate gives us zero conflicting stories. Unfortunately, rasa finds unpredictable actions in the glued stories right after processing the data augmentation blocks. This makes sense due to identical sequences in the augmented stories.

Possible solution

We can solve this problem by setting the data augmentation factor to zero.

Question

Before doing this, are there any specific disadvantages if we set the augmentation to 0? We couldn’t find a specific answer and a use case to this question in the form.

For us, it feels like a risky move since it is not easy to reset this change if we will add many stories that start with an utter. On the other hand, the memoization policy is working great and is not affected by augmentation and will automatically ignore all augmented stories.

dakshvar22 · October 26, 2020, 2:45pm

Hi Jason, that’s a good question. While augmentation helps to glue stories together and train the model on more contextual conversations, the recommended approach is to test the dialogue model on test stories which are not part of the training data. So, you could set augmentation to 0 and then test the model on test stories for performance improvements/degradations. I would also recommend writing end to end tests(Testing Your Assistant) and consistently evaluate your assistant by setting up a CI/CD pipeline(Setting up CI/CD)

Topic		Replies	Views
Data Augmentation Rasa Open Source	1	3395	July 24, 2019
Rasa Core memorization / repetition fails Rasa Open Source	3	932	October 11, 2018
Setting augmentation when training using the RasaX interface [Deprecated] Rasa X Community Edition	0	579	July 29, 2019
Policies and stories problem Rasa Open Source	4	1392	July 13, 2019
Decison tree policy for core Rasa Open Source	13	1091	October 10, 2018

Setting data augmentation to 0 - are there any specific disadvantages?

Problem

Possible solution

Question

Related topics