How to distinct my intent examples

Hi there, I have collected some train data by interactive learning, but there are multiple duplicate examples, for example the intent “affirm” has some examples like “ok”, “yes”, “sure”, “ok”, “ok”. The “ok” appears three times, so how can i make the train data distinct or whether the performance will be benefit from duplicate examples?

there is a PR to fix this Remove duplicate examples when creating TrainingData by hsm207 · Pull Request #4414 · RasaHQ/rasa · GitHub

Thanks for your reply:smile:

Hi! I am wondering why do we have to remove duplicates of training data in the first place? I am just thinking that if users actually send a message more times shouldn’t that receive more weight? isn’t that so?

hi @magda - your intuition is reasonable, ultimately it’s something you have to test out yourself, what gives the best performance

1 Like