DietClassifier wrongly classifies to an entity which is not expected

laxnarasi · September 14, 2020, 4:09am

The below is my configuration: - pipeline:

name: WhitespaceTokenizer “case_sensitive”: False
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer features: [ [“low”, “title”, “upper”], [ “BOS”, “EOS”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, ], [“low”, “title”, “upper”], ]
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 50 batch_strategy: sequence maximum_positive_similarity: 1.0
name: “DucklingHTTPExtractor” url: “http://localhost:8000” dimensions: [“time”, “amount-of-money”, “ordinal”,“duration”,“email”,“phone-number”] locale: “en_EN” timeout: 3
name: EntitySynonymMapper
name: ResponseSelector epochs: 500 retrieval_intent: default

Sample Entity train data for postal_code in nlu.md file-

83642
94526
72762
94086
92236
57401
33458
94568
43123
44035
21740
48601
77840
91761
98059
48187
29483
64118
90604
60067
79605
52601
60302
32127
6810
98208

Query1 - {“text”:“min sales in 2000”}

Output1 - { “intent”: { “name”: “PepsiSalesData”, “confidence”: 1.0 }, “entities”: [ { “entity”: “action”, “start”: 0, “end”: 3, “value”: “minimum”, “extractor”: “DIETClassifier”, “processors”: [ “EntitySynonymMapper” ] }, { “entity”: “measure”, “start”: 4, “end”: 9, “value”: “sales”, “extractor”: “DIETClassifier” }, { “entity”: “postal_code”, “start”: 13, “end”: 17, “value”: “2000”, “extractor”: “DIETClassifier” }, { “start”: 10, “end”: 17, “text”: “in 2000”, “value”: “2000-01-01T00:00:00.000-08:00”, “confidence”: 1.0, “additional_info”: { “values”: [ { “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” } ], “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” }, “entity”: “time”, “extractor”: “DucklingHTTPExtractor” } ], “intent_ranking”: [ { “name”: “PepsiSalesData”, “confidence”: 1.0 }, { “name”: “greet”, “confidence”: 5.512297107657105e-8 } ], “response_selector”: { “oscar”: { “response”: { “name”: null, “confidence”: 0.0 }, “ranking”: [], “full_retrieval_intent”: null } }, “text”: “min sales in 2000” }

Issue1: - { “entity”: “postal_code”, “start”: 13, “end”: 17, “value”: “2000”, “extractor”: “DIETClassifier” }.

This is a wrong classification. Why should 2000 classified as postal_code. None of the postal codes like years (2000, 2001…2010) is there in training data. However DietClassifier classified as postal_code which is wrong. Please let me know what needs to be fixed.

Query2: - {“text”:“min profit jersey new in 2000”}

Output2: - { “intent”: { “name”: “PepsiSalesData”, “confidence”: 1.0 }, “entities”: [ { “entity”: “action”, “start”: 0, “end”: 3, “value”: “minimum”, “extractor”: “DIETClassifier”, “processors”: [ “EntitySynonymMapper” ] }, { “entity”: “measure”, “start”: 4, “end”: 17, “value”: “profit jersey”, “extractor”: “DIETClassifier” }, { “entity”: “state”, “start”: 18, “end”: 21, “value”: “new”, “extractor”: “DIETClassifier” }, { “entity”: “postal_code”, “start”: 26, “end”: 30, “value”: “2000”, “extractor”: “DIETClassifier” }, { “start”: 23, “end”: 30, “text”: “in 2000”, “value”: “2000-01-01T00:00:00.000-08:00”, “confidence”: 1.0, “additional_info”: { “values”: [ { “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” } ], “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” }, “entity”: “time”, “extractor”: “DucklingHTTPExtractor” } ], “intent_ranking”: [ { “name”: “PepsiSalesData”, “confidence”: 1.0 }, { “name”: “greet”, “confidence”: 4.486597404707027e-8 } ], “response_selector”: { “oscar”: { “response”: { “name”: null, “confidence”: 0.0 }, “ranking”: [], “full_retrieval_intent”: null } }, “text”: “min profit jersey new in 2000” }

Issue 2: - { “entity”: “measure”, “start”: 4, “end”: 17, “value”: “profit jersey”, “extractor”: “DIETClassifier” }. Measure entities are

sales
quantity
discount
profit

. Why is the value coming as “profit jersey”. It should be only “profit”.

Topic		Replies	Views
How to turn off Diet classifier for Entity recognition? Rasa Open Source	2	2228	June 20, 2020
DIET Classifier extracting same entity twice Rasa Open Source	1	530	May 7, 2023
Clarification regarding NLU Pipeline and DIETClassifier Rasa Open Source	4	1576	March 4, 2021
Entity not recognized with DIET Rasa Open Source	9	946	July 15, 2020
DIETClassifier not working properly Rasa Open Source	2	574	April 7, 2020

DietClassifier wrongly classifies to an entity which is not expected

Related topics