Entity with typo being recognized and set (DIET)

  • Rasa X 0.27.3.
  • Rasa 1.8.0.
  • Linux (ubuntu 18.04).
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper

I have entity called “month” and lookup table that includes all months. I have enough nlu examples for the lookup table to work. I have also made synonyms for every month.

“February” has synonym “Februarys”. If user writes “Februarys” it will set the month to “February”.

Problem: If user writes “Februry” and it’s not in the synonyms it will be recognized as a month and it will be set as user has written it(typo).

I don’t want the entity to be mapped if there is a typo. If “Februry” or “Dcembers” is recognized it should be mapped to “February” or “December”, NOT as something that is close to the word.

1 Like
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: CRFEntityExtractor
  - name: EntitySynonymMapper
  - name: DIETClassifier
    entity_recognition: false
    epochs: 100

I switched to CRFEntityExtractor. Now it works better. Seems like DIET entity recognition needs some fine tuning for me before i can use it.

Hello Kim, how did you manage to map “Februry” & “Dcembers” to “February” & “December” ? Thanks.