NLU pipeline comparison; Overlapping entity

Hello,

while trying to run a pipeline comparison, this error was thrown:

The first entity is extracted correctly, while the second is most definetly not

Also occurs with other entities overlapping

The model trains without a problem when run on its own, and this error hasn’t occured so far besides in this comparison circumstances.

It seems to always be the same pipeline, code for reference: language: de pipeline:

  • name: SpacyNLP case_sensitive: true
  • name: SpacyTokenizer
  • name: RegexFeaturizer
  • name: CRFEntityExtractor
  • name: “SpacyFeaturizer”
  • name: “SklearnIntentClassifier” C: [1, 2, 5, 10, 20, 100] kernels: [“linear”]
  • name: “CRFEntityExtractor” “features”: [ [“low”, “title”, “upper”], [ “bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, “pattern”, “pos”, “pos2”, ], [“low”, “title”, “upper”], ] “max_iterations”: 50 “L1_c”: 0.1 “L2_c”: 0.1
  • name: EntitySynonymMapper

Help would be greatly appreciated.

Hi @Tegel,

can you please share some of your annotated entities from the NLU file? It seems like the same message parts where annotated with multiple entities.

Hi @Tobias_Wochinger,

I’ve checked through the nlu-file and found no doubly annotated entities, the ones in question are:

I think I found the issue (and solution) to it, however. I am using regex for extracting the VIN, which is fed as feature into the CRF. I also learned that lookaheads can be used to check for smaller phrases - in this case at least one character and one number. By changing the regex accordingly the problem didn’t occur again and training proceeded correctly.

This leaves the question how it can be, that the CRF blocks itself when extracting two different entities - by choosing the highest percentage the choice would have been right in both cases.

Best regards

mhm, can you please share a snippet of the nlu training file which you mean?

I hope this helps:

## intent:Fehler
- der [listenpreis](Falschwert) der Ordernummer [4712783](ordernummer) ist falsch
- [listenpreis](Falschwert) von VIN [KJ678TS](vin) passt ned
- Der Preis für Ordernummer [1324027](ordernummer) ist falsch
- der [listenpreis](Falschwert:listenpreis) mit ordernummer [8982782](ordernummer) ist nicht sauber
- Der [Listenpreis](Falschwert:listenpreis) von [12345607](ordernummer) ist inkorrekt
- [Listenpreos](Falschwert:listenpreis)
- [listenpreos](Falschwert:listenpreis)

##intent: inform
- [ZY27671](vin)
- [C160305](vin)
- [C291143](vin)
- [KL10765](vin)
- [31774](ordernummer)
- [45006](ordernummer)
- [36252](ordernummer)
- [81578](ordernummer)
- [032683](ordernummer)