Misaligned entity annotation error for custom NER

For building custom NER, my pipeline is as below: pipeline:

  • name: “SpacyNLP”
  • name: “SpacyTokenizer”
  • name: “RegexFeaturizer”
  • name: “SpacyFeaturizer”
  • name: “CRFEntityExtractor”

For the list of entities(~2000 examples for 2 entity types ), I am finding the start and index in my dataset using string matching. And passing it with JSON format as mentioned [here] (Training Data Format) However while training I am getting missing entity annotation error. Error description says “Make sure start and end values of the annotating training examples end at token boundaries” How can I ensure that? String matching already giving me correct start and end indices. If it is because of tokenization, how to overcome that?