Adding new token patterns to Whitespace Tokenizer

Hi everyone, I would like to know how does one add new token patterns. I added some regex to the whitespace tokenizer for “-” and “/” as i have certain entities like off-road where i would like the whitespace tokenizer to split the word so DIET can use the word such as off or road to still manage to match with the entity off-road since people can say the term in many ways like i like off road biking or i like off-road 4x4 for exampe.

how does one handle this. when I add the token pattern to Whitespace Tokenizer, i get such warnings

Misaligned entity annotation in message 'vegetarian/vegan' with intent 'specify2019722'. Make sure the start and end values of entities ([(0, 16, '167569490')]) in the training data match the token boundaries ([(10, 11, '/')]). Common causes: 
  1) entities include trailing whitespaces or punctuation
  2) the tokenizer gives an unexpected result, due to languages such as Chinese that don't use whitespace for word separation
  More info at https://rasa.com/docs/rasa/training-data-format#nlu-training-data
2 Likes