Hi there, I have collected some train data by interactive learning, but there are multiple duplicate examples, for example the intent “affirm” has some examples like “ok”, “yes”, “sure”, “ok”, “ok”. The “ok” appears three times, so how can i make the train data distinct or whether the performance will be benefit from duplicate examples?
Thanks for your reply:smile:
Hi! I am wondering why do we have to remove duplicates of training data in the first place? I am just thinking that if users actually send a message more times shouldn’t that receive more weight? isn’t that so?
hi @magda - your intuition is reasonable, ultimately it’s something you have to test out yourself, what gives the best performance