Stopwords in intent examples with entities

Hello everyone,

I am getting the following warning during training the model: “UserWarning: Misaligned entity annotation in message…” and eventually entities are not recognized correctly.

The problem disappears when I do not use stopwords. It seems that entity alignement is compared to the alignement of the token after stopword removal or something similar. How can I override this problem? Is there some sort of configuration needed?

Thank you in advance, Konstantinos

How do you use stop words?

Hello Vladimir, When a user sends a message the message goes through the pipeline. The first step is a tokenizer. Inside the tokenizer and before creating the tokens, every word that is a stopword is removed from the message.

what tokenizer are you using?

A customized Spacy Tokenizer. Shouldn’t we be removing the stopwords before creating the tokens?

entities are aligned with tokens based on the index of a first and last character in the input text. My guess would be that in your custom component after removing stop words, you created a discrepancy between tokens and input text leading to entity misalignment

Hello Vladimir, yes this is the case because the entity after stopword removal has a different position than the initial. What I don’t understand is the following: how do we remove stopwords then? Do remove them but still keep the tokens with their positions as “empty” to tokens somehow?

I’m not sure here, sorry I didn’t look at the code for quite some time, but I think as soon as you keep the word offset of the tokens to correspond to original index in the text, it should work

OK thank you I will try keeping the same positions.