DietClassifier wrongly classifies to an entity which is not expected

The below is my configuration: - pipeline:

  • name: WhitespaceTokenizer “case_sensitive”: False
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer features: [ [“low”, “title”, “upper”], [ “BOS”, “EOS”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, ], [“low”, “title”, “upper”], ]
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: DIETClassifier epochs: 50 batch_strategy: sequence maximum_positive_similarity: 1.0
  • name: “DucklingHTTPExtractor” url: “http://localhost:8000” dimensions: [“time”, “amount-of-money”, “ordinal”,“duration”,“email”,“phone-number”] locale: “en_EN” timeout: 3
  • name: EntitySynonymMapper
  • name: ResponseSelector epochs: 500 retrieval_intent: default

Sample Entity train data for postal_code in nlu.md file-

Query1 - {“text”:“min sales in 2000”}

Output1 - { “intent”: { “name”: “PepsiSalesData”, “confidence”: 1.0 }, “entities”: [ { “entity”: “action”, “start”: 0, “end”: 3, “value”: “minimum”, “extractor”: “DIETClassifier”, “processors”: [ “EntitySynonymMapper” ] }, { “entity”: “measure”, “start”: 4, “end”: 9, “value”: “sales”, “extractor”: “DIETClassifier” }, { “entity”: “postal_code”, “start”: 13, “end”: 17, “value”: “2000”, “extractor”: “DIETClassifier” }, { “start”: 10, “end”: 17, “text”: “in 2000”, “value”: “2000-01-01T00:00:00.000-08:00”, “confidence”: 1.0, “additional_info”: { “values”: [ { “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” } ], “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” }, “entity”: “time”, “extractor”: “DucklingHTTPExtractor” } ], “intent_ranking”: [ { “name”: “PepsiSalesData”, “confidence”: 1.0 }, { “name”: “greet”, “confidence”: 5.512297107657105e-8 } ], “response_selector”: { “oscar”: { “response”: { “name”: null, “confidence”: 0.0 }, “ranking”: [], “full_retrieval_intent”: null } }, “text”: “min sales in 2000” }

Issue1: - { “entity”: “postal_code”, “start”: 13, “end”: 17, “value”: “2000”, “extractor”: “DIETClassifier” }.

This is a wrong classification. Why should 2000 classified as postal_code. None of the postal codes like years (2000, 2001…2010) is there in training data. However DietClassifier classified as postal_code which is wrong. Please let me know what needs to be fixed.

Query2: - {“text”:“min profit jersey new in 2000”}

Output2: - { “intent”: { “name”: “PepsiSalesData”, “confidence”: 1.0 }, “entities”: [ { “entity”: “action”, “start”: 0, “end”: 3, “value”: “minimum”, “extractor”: “DIETClassifier”, “processors”: [ “EntitySynonymMapper” ] }, { “entity”: “measure”, “start”: 4, “end”: 17, “value”: “profit jersey”, “extractor”: “DIETClassifier” }, { “entity”: “state”, “start”: 18, “end”: 21, “value”: “new”, “extractor”: “DIETClassifier” }, { “entity”: “postal_code”, “start”: 26, “end”: 30, “value”: “2000”, “extractor”: “DIETClassifier” }, { “start”: 23, “end”: 30, “text”: “in 2000”, “value”: “2000-01-01T00:00:00.000-08:00”, “confidence”: 1.0, “additional_info”: { “values”: [ { “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” } ], “value”: “2000-01-01T00:00:00.000-08:00”, “grain”: “year”, “type”: “value” }, “entity”: “time”, “extractor”: “DucklingHTTPExtractor” } ], “intent_ranking”: [ { “name”: “PepsiSalesData”, “confidence”: 1.0 }, { “name”: “greet”, “confidence”: 4.486597404707027e-8 } ], “response_selector”: { “oscar”: { “response”: { “name”: null, “confidence”: 0.0 }, “ranking”: [], “full_retrieval_intent”: null } }, “text”: “min profit jersey new in 2000” }

Issue 2: - { “entity”: “measure”, “start”: 4, “end”: 17, “value”: “profit jersey”, “extractor”: “DIETClassifier” }. Measure entities are

. Why is the value coming as “profit jersey”. It should be only “profit”.