NLU pipeline comparison; Overlapping entity

Tegel · March 12, 2020, 4:30pm

Hello,

while trying to run a pipeline comparison, this error was thrown:

The first entity is extracted correctly, while the second is most definetly not

Also occurs with other entities overlapping

The model trains without a problem when run on its own, and this error hasn’t occured so far besides in this comparison circumstances.

It seems to always be the same pipeline, code for reference: language: de pipeline:

name: SpacyNLP case_sensitive: true
name: SpacyTokenizer
name: RegexFeaturizer
name: CRFEntityExtractor
name: “SpacyFeaturizer”
name: “SklearnIntentClassifier” C: [1, 2, 5, 10, 20, 100] kernels: [“linear”]
name: “CRFEntityExtractor” “features”: [ [“low”, “title”, “upper”], [ “bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, “pattern”, “pos”, “pos2”, ], [“low”, “title”, “upper”], ] “max_iterations”: 50 “L1_c”: 0.1 “L2_c”: 0.1
name: EntitySynonymMapper

Help would be greatly appreciated.

Tobias_Wochinger · March 16, 2020, 8:20am

Hi @Tegel,

can you please share some of your annotated entities from the NLU file? It seems like the same message parts where annotated with multiple entities.

Tegel · March 16, 2020, 9:04am

Hi @Tobias_Wochinger,

I’ve checked through the nlu-file and found no doubly annotated entities, the ones in question are:

Listenpreos
032683

I think I found the issue (and solution) to it, however. I am using regex for extracting the VIN, which is fed as feature into the CRF. I also learned that lookaheads can be used to check for smaller phrases - in this case at least one character and one number. By changing the regex accordingly the problem didn’t occur again and training proceeded correctly.

This leaves the question how it can be, that the CRF blocks itself when extracting two different entities - by choosing the highest percentage the choice would have been right in both cases.

Best regards

Tobias_Wochinger · March 23, 2020, 1:34pm

mhm, can you please share a snippet of the nlu training file which you mean?

Tegel · March 23, 2020, 3:53pm

I hope this helps:

## intent:Fehler
- der [listenpreis](Falschwert) der Ordernummer [4712783](ordernummer) ist falsch
- [listenpreis](Falschwert) von VIN [KJ678TS](vin) passt ned
- Der Preis für Ordernummer [1324027](ordernummer) ist falsch
- der [listenpreis](Falschwert:listenpreis) mit ordernummer [8982782](ordernummer) ist nicht sauber
- Der [Listenpreis](Falschwert:listenpreis) von [12345607](ordernummer) ist inkorrekt
- [Listenpreos](Falschwert:listenpreis)
- [listenpreos](Falschwert:listenpreis)

##intent: inform
- [ZY27671](vin)
- [C160305](vin)
- [C291143](vin)
- [KL10765](vin)
- [31774](ordernummer)
- [45006](ordernummer)
- [36252](ordernummer)
- [81578](ordernummer)
- [032683](ordernummer)

Topic		Replies	Views
Multiple NER Rasa Open Source	10	1265	May 24, 2019
Misaligned entity annotation error for custom NER Rasa Open Source	0	775	July 4, 2019
Entities not being extracted correctly Rasa Open Source	4	844	June 25, 2019
After using SpacyTokenizer: Misaligned entity annotation error when using CRFEntityExtraction Rasa Open Source	0	961	February 24, 2020
Trouble extracting entities Rasa Open Source	2	349	September 6, 2018

NLU pipeline comparison; Overlapping entity

Related Topics