How to avoid duplicate stories, when creating more samples through interactive learning

anoopmohan · May 8, 2019, 6:24am

Hi,

I have noticed that there are many duplicate stories have been created as part of interactive learning, when adding more sample user inputs for an intent. So my questions are:

Do we really need to create stories for each sample user input added in an intent?
If we have duplicate stories for each intent data, will it help in anyway for machine learning or predictions?
If it is not needed, is it possible to avoid adding those duplicate stories (if we know the story is already available), when exporting the data at the end of interactive learning process?
If we have more duplicate stories in stories.md, will it affect the performance of training or bot conversation? If yes, how to avoid it?

Please advise.

Thank you, Anoop Mohan

anoopmohan · May 10, 2019, 4:31am

Anyone, please share your thoughts on this.

anoopmohan · May 13, 2019, 12:43pm

Hi, anyone has any suggestion on this?

erohmensing · May 21, 2019, 12:56pm

Hey Anoop, I agree that we shouldn’t be creating duplicate stories through interactive learning. With regards to your 4th point, I believe it will increase training time while only (maybe) minimally improving the performance of a machine learning policy.

With regards to 3: why are you going through the process of doing IL for this story path if you know it’s already available?

Can you clarify what you mean by question 1?

anoopmohan · May 21, 2019, 1:51pm

Thank you @erohmensing for the response.

In my case, I don’t really want to update nlu.md, stories.md, domain.yml files manually, when training the bot with new data. Hence, the only possible way for adding a new intent and defining entities for those intent is the interactive learning method. So that we don’t need to touch any of those files (nlu.md, stories.md, domain.yml) and IL itself will update these files with new data/stories.

Now, if we want to add more data samples to the same intent in future, again I don’t want to update nlu.md file manually, as that is not a good idea to update the file manually for defining entities and synonyms. In this case, still I need to run the interactive learning process to add the new samples under an existing intent.

So, whenever we execute IL, I believe the stories.md file also will get updated with a duplicate story (if we run the IL for adding a new sample to an existing intent) and I don’t see anyway to avoid this.

To summarize, if I add 10 data samples for an intent by running IL, then 10 stories (duplicate) will be added to stories.md.

Regarding my 1st question: Do we really need this 10 stories (as my example above)? I believe we need only 1 story in this case, since all the remaining stories are duplicate. I know that, we might need separate stories based on different path (eg: happy or sad), but we don;t need duplicate stories for the same path (10 stories for same happy path)

Regarding my 3rd question: Is there anyway to avoid creating duplicate stories like this during IL?

Please confirm. Let me know, if you still have any confusion on my question.

Thank you.

erohmensing · May 21, 2019, 3:30pm

I reckon it would actually be much faster to add data by editing the files themselves than by going through IL this time – especially if you’re just going through same path each time just to update NLU data. That being said, there isn’t currently a way to avoid creating duplicate stories during IL, but I’ve created an issue for it. As we’ll probably be working hard to fix bugs on the new Rasa X product, I can’t say when we’ll get around to it. If you’d be interested in contributing to solving this problem, we’d happily take a contribution.

Ghostvv · May 21, 2019, 3:37pm

rasa core performs deduplication of stories before passing them to policies

Topic		Replies	Views
Interactive and Training Knowledge/models Rasa Open Source	3	718	August 8, 2018
Generating stories Rasa Open Source	7	3264	March 7, 2019
Does interactive learning saves NLU data? Rasa Open Source	2	599	September 14, 2018
How to avoid mixing stories with shared utters and actions Rasa Open Source	2	428	April 14, 2020
Interactive learning in rasa Rasa Open Source edo	1	528	January 9, 2019

How to avoid duplicate stories, when creating more samples through interactive learning

Related topics