Hey.
I’m trying to switch from old EmbeddingIntentClassifier with Duckling to DIET Classifier solely. However, I came across some problems with entitiy extraction part.
I need some number
entities to be extracted and converted to numeric form. For ex. “three thousand” should be extracted as number and returned to me as “3000”. This is how Duckling worked and how I need it (my form action maps this numbers to the slots which I write in my corresponding database table).
For this reason I have included EntitySynonymMapper
in my pipeline and provided this kind of data in my nlu.md file:
- [eight]{"entity": "number", "value": "8"}
- [twelve]{"entity": "number", "value": "12"}
- [two thousand and six hundred]{"entity": "number", "value": "2600"}
- [seven hundred and sixty]{"entity": "number", "value": "760"}
However, it still returns the numbers in string format like this:
* inform: [fifteen thousand](number)
What am I doing wrong? Any suggestions?
This is the pipeline I’m using:
pipeline:
- name: ConveRTTokenizer
- name: ConveRTFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: CountVectorsFeaturizer
- name: DIETClassifier
epochs: 100
num_transformer_layers: 4
transformer_size: 256
use_masked_language_model: True
drop_rate: 0.25
weight_sparsity: 0.7
batch_size: [64, 256]
embedding_dimension: 30
hidden_layer_sizes:
text: [512, 128]
- name: EntitySynonymMapper
- name: ResponseSelector