Stopwords in intent examples with entities

kmegalokonomos · January 14, 2021, 9:46am

Hello everyone,

I am getting the following warning during training the model: “UserWarning: Misaligned entity annotation in message…” and eventually entities are not recognized correctly.

The problem disappears when I do not use stopwords. It seems that entity alignement is compared to the alignement of the token after stopword removal or something similar. How can I override this problem? Is there some sort of configuration needed?

Thank you in advance, Konstantinos

Ghostvv · January 15, 2021, 9:51am

How do you use stop words?

kmegalokonomos · January 15, 2021, 9:59am

Hello Vladimir, When a user sends a message the message goes through the pipeline. The first step is a tokenizer. Inside the tokenizer and before creating the tokens, every word that is a stopword is removed from the message.

Ghostvv · January 15, 2021, 10:16am

what tokenizer are you using?

kmegalokonomos · January 15, 2021, 10:30am

A customized Spacy Tokenizer. Shouldn’t we be removing the stopwords before creating the tokens?

Ghostvv · January 15, 2021, 10:42am

entities are aligned with tokens based on the index of a first and last character in the input text. My guess would be that in your custom component after removing stop words, you created a discrepancy between tokens and input text leading to entity misalignment

kmegalokonomos · January 15, 2021, 12:22pm

Hello Vladimir, yes this is the case because the entity after stopword removal has a different position than the initial. What I don’t understand is the following: how do we remove stopwords then? Do remove them but still keep the tokens with their positions as “empty” to tokens somehow?

Ghostvv · January 15, 2021, 3:46pm

I’m not sure here, sorry I didn’t look at the code for quite some time, but I think as soon as you keep the word offset of the tokens to correspond to original index in the text, it should work

kmegalokonomos · January 16, 2021, 12:48pm

OK thank you I will try keeping the same positions.

Topic		Replies	Views
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	1051	February 24, 2020
Hindi entity extraction. Tokenizer issue Rasa Open Source	2	629	June 11, 2020
Misaligned entity annotation in message Rasa Open Source	1	1028	July 7, 2020
NLU entity position misalignment due to custom Lemmatization Preprocessing Rasa Open Source	0	660	July 24, 2019
Sinhala entity classifications Rasa Open Source	1	367	July 8, 2020

Stopwords in intent examples with entities

Related topics