How to get the Accuracy from MitieFeaturizer to DIETClassifier or CRFEntityExtractor

Hello, guys

I’m using rasa to train a model with chinese. And then I have a problem. it’s about rasa nlu pipeline. First I was using MitieEntityExtractor to extract entities. It’s works perfects. But the trainning time is too long. So I had change my extractor to DIETClassifier and CRFEntityExtractor. The problem is beginning.

I was asking a question to my bot: What’s the weather like today? The MitieEntityExtractor extracted “Today” entity. good one. But The DIETClassifier and CRFEntityExtractor extracted whole of the question “What’s the weather like today” both. How to improve this?

Please look below. The config.yml file between with these extractors is only extractor.

  1. MitieEntityExtractor and extracted results

language: “zh”

pipeline:

  • name: “MitieNLP” model: “data/total_word_feature_extractor_zh.dat”
  • name: “JiebaTokenizer” dictionary_path: “data/dict”
  • name: “MitieEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “RegexFeaturizer”
  • name: “MitieFeaturizer”
  • name: “SklearnIntentClassifier”

policies:

  • name: KerasPolicy epochs: 500 max_history: 5
  • name: FallbackPolicy fallback_action_name: ‘action_default_fallback’
  • name: MemoizationPolicy max_history: 5
  • name: FormPolicy

怎么样今天的天气 { “intent”: { “name”: “request_weather”, “confidence”: 0.45935003402929125 }, “entities”: [ { “entity”: “date_time”, “value”: “今天”, “start”: 3, “end”: 5, “confidence”: null, “extractor”: “MitieEntityExtractor” } ] }

  1. CRFEntityExtractor and extrated results

language: “zh”

pipeline:

  • name: “MitieNLP” model: “data/total_word_feature_extractor_zh.dat”
  • name: “JiebaTokenizer” dictionary_path: “data/dict”
  • name: “RegexFeaturizer”
  • name: “MitieFeaturizer”
  • name: “CRFEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “SklearnIntentClassifier”

policies:

  • name: KerasPolicy epochs: 500 max_history: 5
  • name: FallbackPolicy fallback_action_name: ‘action_default_fallback’
  • name: MemoizationPolicy max_history: 5
  • name: FormPolicy

怎么样今天的天气 { “intent”: { “name”: “request_weather”, “confidence”: 0.5135718909497647 }, “entities”: [ { “entity”: “date_time”, “start”: 0, “end”: 8, “confidence_entity”: 0.6655249585642125, “value”: “怎么样今天的天气”, “extractor”: “CRFEntityExtractor” } ],

  1. DIETClassifier and extracted results

language: “zh”

pipeline:

  • name: “MitieNLP” model: “data/total_word_feature_extractor_zh.dat”
  • name: “JiebaTokenizer” dictionary_path: “data/dict”
  • name: “RegexFeaturizer”
  • name: “MitieFeaturizer”
  • name: “DIETClassifier” intent_classification: False
  • name: “EntitySynonymMapper”
  • name: “SklearnIntentClassifier”

policies:

  • name: KerasPolicy epochs: 500 max_history: 5
  • name: FallbackPolicy fallback_action_name: ‘action_default_fallback’
  • name: MemoizationPolicy max_history: 5
  • name: FormPolicy

怎么样今天的天气 { “intent”: { “name”: “request_weather”, “confidence”: 0.4984465527505863 }, “entities”: [ { “entity”: “date_time”, “start”: 0, “end”: 8, “value”: “怎么样今天的天气”, “extractor”: “DIETClassifier” } ],

This is the test result json file: CRFEntityExtractor_errors.json (12.5 KB)

I guess you are not using the latest version of Rasa. We had some issues extracting entities in Chinese. Can you try updating to the latest Rasa version? Does the error still occur?