Wanted to know how to handle entities which are composite i.e. multi-token ? For eg
- I am looking for a ‘Chicken Biryani’ recipe
- I am looking for ‘Fried Chicken’ recipe
Also we can look for just one word token as well. 3) I am looking for a spicy ‘Chicken’ recipe.
‘Chicken Biryani’,‘Fried Chicken’ and ‘Chicken’ are all entities called as ‘Dish’.
So with respect to creating the training data - If we highlight each one of the above entities as ‘Dish’ (and multiple examples of it) that would do ?
Also any changes in tokenization scheme ? Currently tokenization pipeline is defined as “tokenizer_spacy”.