The below is my configuration: - pipeline:
- name: WhitespaceTokenizer “case_sensitive”: False
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer features: [ [“low”, “title”, “upper”], [ “BOS”, “EOS”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, ], [“low”, “title”, “upper”], ]
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
- name: DIETClassifier epochs: 50 batch_strategy: sequence maximum_positive_similarity: 1.0
- name: “DucklingHTTPExtractor” url: “http://localhost:8000” dimensions: [“time”, “amount-of-money”, “ordinal”,“duration”,“email”,“phone-number”] locale: “en_EN” timeout: 3
- name: EntitySynonymMapper
- name: ResponseSelector epochs: 500 retrieval_intent: default
Sample Entity train data for postal_code in nlu.md file-
- 83642
- 94526
- 72762
- 94086
- 92236
- 57401
- 33458
- 94568
- 43123
- 44035
- 21740
- 48601
- 77840
- 91761
- 98059
- 48187
- 29483
- 64118
- 90604
- 60067
- 79605
- 52601
- 60302
- 32127
- 6810
- 98208
Query1 - {“text”:“min sales in 2000”}
Output1 - { “intent”: { “name”: “PepsiSalesData”, “confidence”: 1.0 }, “entities”: [ { “entity”: “action”, “start”: 0, “end”: 3, “value”: “minimum”, “extractor”: “DIETClassifier”, “processors”: [ “EntitySynonymMapper” ] }, { “entity”: “measure”, “start”: 4, “end”: 9, “value”: “sales”, “extractor”: “DIETClassifier” }, { “entity”: “postal_code”, “start”: 13, “end”: 17, “value”: “2000”, “extractor”: “DIETClassifier” }, { “start”: 10, “end”: 17, “text”: “in 2000”, “value”: “2000-01-01T00:00:00.000-08:00”, “confidence”: 1.0, “additional_info”: { “values”: [ { “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” } ], “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” }, “entity”: “time”, “extractor”: “DucklingHTTPExtractor” } ], “intent_ranking”: [ { “name”: “PepsiSalesData”, “confidence”: 1.0 }, { “name”: “greet”, “confidence”: 5.512297107657105e-8 } ], “response_selector”: { “oscar”: { “response”: { “name”: null, “confidence”: 0.0 }, “ranking”: [], “full_retrieval_intent”: null } }, “text”: “min sales in 2000” }
Issue1: - { “entity”: “postal_code”, “start”: 13, “end”: 17, “value”: “2000”, “extractor”: “DIETClassifier” }.
This is a wrong classification. Why should 2000 classified as postal_code. None of the postal codes like years (2000, 2001…2010) is there in training data. However DietClassifier classified as postal_code which is wrong. Please let me know what needs to be fixed.
Query2: - {“text”:“min profit jersey new in 2000”}
Output2: - { “intent”: { “name”: “PepsiSalesData”, “confidence”: 1.0 }, “entities”: [ { “entity”: “action”, “start”: 0, “end”: 3, “value”: “minimum”, “extractor”: “DIETClassifier”, “processors”: [ “EntitySynonymMapper” ] }, { “entity”: “measure”, “start”: 4, “end”: 17, “value”: “profit jersey”, “extractor”: “DIETClassifier” }, { “entity”: “state”, “start”: 18, “end”: 21, “value”: “new”, “extractor”: “DIETClassifier” }, { “entity”: “postal_code”, “start”: 26, “end”: 30, “value”: “2000”, “extractor”: “DIETClassifier” }, { “start”: 23, “end”: 30, “text”: “in 2000”, “value”: “2000-01-01T00:00:00.000-08:00”, “confidence”: 1.0, “additional_info”: { “values”: [ { “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” } ], “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” }, “entity”: “time”, “extractor”: “DucklingHTTPExtractor” } ], “intent_ranking”: [ { “name”: “PepsiSalesData”, “confidence”: 1.0 }, { “name”: “greet”, “confidence”: 4.486597404707027e-8 } ], “response_selector”: { “oscar”: { “response”: { “name”: null, “confidence”: 0.0 }, “ranking”: [], “full_retrieval_intent”: null } }, “text”: “min profit jersey new in 2000” }
Issue 2: - { “entity”: “measure”, “start”: 4, “end”: 17, “value”: “profit jersey”, “extractor”: “DIETClassifier” }. Measure entities are
. Why is the value coming as “profit jersey”. It should be only “profit”.