Entity extraction regexentityextractoe

Hello, I have a question : Can we force an entity to be recongnize only by regexentityextractor and not by DietClassifier?

Thank you for your help

No, that is currently not possible. However, this might change in the future.

In order for the RegexEntityExtractor to know about what entities exists, you need to add them to your training examples. One example is enough. But due to that DIETClassifier will also be trained on those.

1 Like

Yes, i notices that if a regex number is only 6 digits named (prod)… and despite i add 20 examples of number with 6 digits, the DietClassifier recognize two digits like the entity (prod) … I will try to add more examples. I hope it will work

My pipeline is like that :

language: “fr” pipeline:

  • name: WhitespaceTokenizer
  • name: RegexFeaturizer
  • name: RegexEntityExtractor use_regexes: True
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 100 random_seed: 1337 batch_strategy: sequence
  • name: EntitySynonymMapper

Is there a component who helps DietClassifier to recognizer regex feature?

Thank you…

That would be the RegexFeaturizer.

1 Like

Yes i know that. I will try something else … But in my case, RegexFeaturizer don’t seem to help DietClassifier … It is not working at all… I will try something else …

Thank you