NLU entity position misalignment due to custom Lemmatization Preprocessing

Seb · July 24, 2019, 7:04pm

Hey! I was following this tutorial here https://medium.com/@tatiana.parshina/building-rasa-nlu-custom-component-for-lemmatization-with-spacy-e2e6a9562c34 trying to implement Lemmatization as a preprocessing step in my NLU pipeline.

I ran into an issue with the entity start and end positions. When a word gets lemmatized obviously it becomes shorter (e.g. looking -> look). So if “looking” is the first word in the sentence it has the start value 1 and the end value 7 assigned, however after lemmatization it should have the end value 4. How can I make sure, that the word positions get reassigned after lemmatization?

My lemmatization component is exactly the same as in the tutorial and my NLU pipeline is as follows: language: “de”

pipeline:

name: “SpacyNLP”
name: “SpacyTokenizer_Lemmatizer”
name: “SpacyFeaturizer”
name: “RegexFeaturizer”
name: “CRFEntityExtractor”
name: “EntitySynonymMapper”
name: “SklearnIntentClassifier”

The model can be trained, but I get the warning “Make sure the start and end values of the annotated training examples end at token boundaries (e.g. don’t include trailing whitespaces or punctuation).” when training with Lemmatization. Does anyone have an idea?

Topic		Replies	Views
Misaligned entity annotation error for custom NER Rasa Open Source	0	811	July 4, 2019
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	1051	February 24, 2020
NLU not predicting entities separated by the '/' character in the new version of Rasa. Why? Rasa Open Source	3	504	June 11, 2020
Misaligned entity annotation for '01/03' in sentence (...) Rasa Open Source	5	832	March 25, 2020
Lemmatization & Punctuations Rasa Open Source	9	3283	September 25, 2019

NLU entity position misalignment due to custom Lemmatization Preprocessing

Related topics