I’m doing some data augmentation and what I currently use (GitHub - jasonwei20/eda_nlp: Code for the ICLR 2019 Workshop paper: Easy data augmentation techniques for boosting performance on text classification tasks.) removes all apostrophes. This means it turns I’m into either Im or I m (not all apostrophes are made equal I just found out, looking forward to cleaning my data).
Anyways, how does rasa do it? Should I work to get those apostrophes back into my training data or not?
Edit: Now I’m also wondering about other stuff like brackets: “Hi, I’m Jane (Gary’s Wife)”. If you’d ignore the brackets you’d lose meaning, wouldnt you