I am working on testing how MitieEntityExtractor
works compared to Duckling or Spacy in identifying entities like: phone_no, time, date, home_address, email_address, amount_of_money and organisation. But there is very little documentation on this.
(I am using Rasa version 2.6)
- Firstly, I am unsure if I have set this up correctly. Can someone tell me if this is correct? This is my
config.yml
that has theMitieEntityExtractor
:
pipeline:
# No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
# If you'd like to customize it, uncomment and adjust the pipeline.
# See https://rasa.com/docs/rasa/tuning-your-model for more information.
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: "MitieNLP"
# language model to load
model: "data/total_word_feature_extractor.dat"
- name: "MitieEntityExtractor"
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1
and I have found the total_word_feature_extractor.dat
file at https://github.com/SigmaQuan/rasa_chatbot/tree/master/data.
In addition, my nlu.yml data has been labelled and they look a bit like this:
- intent: callback
examples: |
- I would like to arrange a callback [at 3](time)
- I would like to arrange a callback
- call me back tomorrow [at 1 pm](time)
- arrange call [Monday the 13th](date) [at 13:20](time)
.
- Once I trained the bot, I seem to have a duplication of entity labels identified. It seems that both the DIETClassifier and MitieEntityExtractor both extract their own entities. Here is an example from the events tracker:
{"entity":"date","start":0,"end":8,"confidence_entity":0.8962544202804565,"value":"Thursday","extractor":"DIETClassifier"},
{"entity":"date","start":10,"end":25,"confidence_entity":0.46212172508239746,"value":"13th of January","extractor":"DIETClassifier"},
{"entity":"date","value":"Thursday 13th of January","start":0,"end":25,"confidence":null,"extractor":"MitieEntityExtractor"}]
Why does this happen? Is it because of my incorrect configuration??
Thank you in advance.