Misaligned entity annotation

Akshit · December 21, 2018, 6:12am

Hi guys! I keep getting this extractor error during nlu training.

rasa_nlu.extractors.crf_entity_extractor - Misaligned entity annotation in sentence ‘What was today’s Attendance?’. Make sure the start and end values of the annotated training examples end at token boundaries (e.g. don’t include trailing whitespaces or punctuation).

I tried to remove whitespaces and everything. Does this affect entity prediction or intent classification?

JiteshGaikwad · December 21, 2018, 6:24am

can attach the screenshot of your training data?

Akshit · December 21, 2018, 7:06am

Here

yanvirin · February 13, 2019, 1:37am

Is there an update on this? I am still getting those when using the rasa nlu trainer to generate the training data for nlu training.

yanvirin · February 15, 2019, 11:25pm

I think I understand the problem. It is related to tokenization and the fact that the indices of the entity do not fall exactly on token boundaries. You either need to change tokenization such that the entity will be marked on the edges of a token always, or you have to change the indices of the entity to include the character which is in the same token as the entity but not marked down as such. Does it make sense?

example of zip code(with white space tokenization): good: I am from this zip code 93333 bad: I am from this zip code,93333

david-pureal · May 2, 2019, 5:39am

I encounter this problem too, but I’m using jieba_tokenizer for chinese. It got fixed after I put all my entity intot jieba’s user dict.

athenasaurav · May 2, 2020, 3:20am

I had this problem, once. My issue was i was using “.” in the entity training file.

like :

What is the [employee id.](id_details) of [kumar saurav](name)

i removed the “.” after employee id and it worked perfectly.

changed code was:

What is the [employee id](id_details) of [kumar saurav](name)

lqzmforer · June 3, 2020, 1:30am

JiebaTokenizer for Chinese too. But when I put it in jieba user_dict, the problem still exist. rasa 1.10

Topic		Replies	Views
[HELP NEEDED] Misaligned entity annotation in message Rasa Open Source	6	1802	September 13, 2022
Misaligned entity annotation in message Rasa Open Source	1	1022	July 7, 2020
Warning for arabic annotation during training Rasa Open Source	0	324	March 11, 2022
Hindi entity extraction. Tokenizer issue Rasa Open Source	2	624	June 11, 2020
Sinhala entity classifications Rasa Open Source	1	364	July 8, 2020

Misaligned entity annotation

Related topics