Email classification

How much training data do you have? E.g. how much examples per entity? It often helps just to add a couple of more examples to the training data.

Also if you want to extract number of a certain pattern, I recommend to either use duckling or RegexEntityExtractor. That should help you to extract entities that follow a certain pattern.

Also it might be a good idea to switch to DIETClassifier for entity extraction instead of CRFEntityExtractor as it is usually a bit more powerful.

So maybe you can try the following config and use the DIETClassifier to extract civilite and the RegexEntityExtractor to extract the phone number, for example.

- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
- name: RegexEntityExtractor
- name: EntitySynonymMapper
1 Like