Misaligned entity annotation error for custom NER

samrudh · July 4, 2019, 6:49am

For building custom NER, my pipeline is as below: pipeline:

name: “SpacyNLP”
name: “SpacyTokenizer”
name: “RegexFeaturizer”
name: “SpacyFeaturizer”
name: “CRFEntityExtractor”

For the list of entities(~2000 examples for 2 entity types ), I am finding the start and index in my dataset using string matching. And passing it with JSON format as mentioned [here] (Training Data Format) However while training I am getting missing entity annotation error. Error description says “Make sure start and end values of the annotating training examples end at token boundaries” How can I ensure that? String matching already giving me correct start and end indices. If it is because of tokenization, how to overcome that?

Topic		Replies	Views
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	1050	February 24, 2020
NLU entity position misalignment due to custom Lemmatization Preprocessing Rasa Open Source	0	660	July 24, 2019
Misaligned entity annotation Rasa Open Source	7	4614	June 3, 2020
[HELP NEEDED] Misaligned entity annotation in message Rasa Open Source	6	1838	September 13, 2022
Misaligned entity annotation for '01/03' in sentence (...) Rasa Open Source	5	830	March 25, 2020

Misaligned entity annotation error for custom NER

Related topics