Number Entity Extraction with DIET Fails on Synonym Mapping


I’m trying to switch from old EmbeddingIntentClassifier with Duckling to DIET Classifier solely. However, I came across some problems with entitiy extraction part.

I need some number entities to be extracted and converted to numeric form. For ex. “three thousand” should be extracted as number and returned to me as “3000”. This is how Duckling worked and how I need it (my form action maps this numbers to the slots which I write in my corresponding database table). For this reason I have included EntitySynonymMapper in my pipeline and provided this kind of data in my file:

- [eight]{"entity": "number", "value": "8"}
- [twelve]{"entity": "number", "value": "12"}
- [two thousand and six hundred]{"entity": "number", "value": "2600"}
- [seven hundred and sixty]{"entity": "number", "value": "760"}

However, it still returns the numbers in string format like this:

* inform: [fifteen thousand](number)

What am I doing wrong? Any suggestions?

This is the pipeline I’m using:

  - name: ConveRTTokenizer
  - name: ConveRTFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: CountVectorsFeaturizer
  - name: DIETClassifier
    epochs: 100
    num_transformer_layers: 4
    transformer_size: 256
    use_masked_language_model: True
    drop_rate: 0.25
    weight_sparsity: 0.7
    batch_size: [64, 256]
    embedding_dimension: 30
      text: [512, 128]
  - name: EntitySynonymMapper
  - name: ResponseSelector

You should still be able to use Duckling while DIET is also in your pipeline. If I recall correctly DIET and Duckiling should be able to detect entities side-by-side and I’d argue Duckling fits your use-case very well here. Just add the Duckling configuration to the same config.yml file and you should be good :slight_smile:.