Problem with using two different entity extractors

Hello everyone,

I’m using 2 entity extractors in my pipepline : DIETClassifier & CRFEntityExtractor because I’m using REGEX for extracting emails.

My pipeline looks like this :

pipeline:

  • name: WhitespaceTokenizer case_sensitive: False
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: CRFEntityExtractor
  • name: DIETClassifier epochs: 100
  • name: EntitySynonymMapper

The problem is when I use the method get_slot() in my custom action, I get a list of two values [value1,value1] , I guess it means that the entity is extracted twice by DIET & CRF. But this is not the case for all my entities, as some of them are only extracted once by the DIETClassifier.

How can I fix this problem ? Is there a way to specify which entity should be extracted with which extractor ?

Thanks.

Duckling allows you to specify which types of entities you want it to extract but not CRF or DIET. You can create a custom action to select the entity or use auto_fill in the entity specification and it will be done automatically.

Greg

Hello Sir @stephens, thank you for answering ! I’m not sure I got your point, the auto_fill doesnt allow you to choose which extractor to use or am I wrong ? Can you please provide more clarification ? Thanks.

1 Like

No, it doesn’t. It’s another alternative to the custom action approach.