How to distinct my intent examples

jianjunchang · September 26, 2019, 5:52am

Hi there, I have collected some train data by interactive learning, but there are multiple duplicate examples, for example the intent “affirm” has some examples like “ok”, “yes”, “sure”, “ok”, “ok”. The “ok” appears three times, so how can i make the train data distinct or whether the performance will be benefit from duplicate examples?

amn41 · September 30, 2019, 9:16am

there is a PR to fix this Remove duplicate examples when creating TrainingData by hsm207 · Pull Request #4414 · RasaHQ/rasa · GitHub

jianjunchang · September 30, 2019, 9:18am

Thanks for your reply:smile:

magda · September 18, 2020, 7:49am

Hi! I am wondering why do we have to remove duplicates of training data in the first place? I am just thinking that if users actually send a message more times shouldn’t that receive more weight? isn’t that so?

amn41 · September 18, 2020, 2:20pm

hi @magda - your intuition is reasonable, ultimately it’s something you have to test out yourself, what gives the best performance

Topic		Replies	Views
Is it a problem, if i have more nlu examples Rasa Open Source	2	354	October 5, 2021
Problem with same utterance in different intents Rasa Open Source	6	535	January 17, 2022
Improve Intent Classification Rasa Open Source	2	716	June 9, 2023
Utterance Duplication Rasa Open Source	4	530	October 4, 2019
Advices for creating a data set Rasa Open Source	8	1054	September 27, 2018

How to distinct my intent examples

Related Topics