Entity value returned does not match entity value in custom component

Issue Summary

Problem: Inconsistent entity values between a custom component and the final Rasa response when extracting product names.

Example:

  • Input text: “washer and dryer brackets”
  • Custom component extracts:
    • Value: “washer”
    • Value: “dryer brackets”
  • Final Rasa response:
    • Value: “washer brackets”
    • Value: “dryer brackets”

Pipeline Configuration

Pipeline Configuration:

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: addons.my_custom_components.EntityTypoFixer
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

Custom Component Log

Log from EntityTypoFixer:

{
  "entities": [
    {
      "confidence_entity": 0.9867187142372131,
      "end": 6,
      "entity": "product",
      "extractor": "DIETClassifier",
      "start": 0,
      "value": "washer"
    },
    {
      "confidence_entity": 0.9957659244537354,
      "end": 21,
      "entity": "product",
      "extractor": "DIETClassifier",
      "start": 11,
      "value": "dryer brackets"
    }
  ],
  "intent": {
    "confidence": 1.0,
    "name": "get_product"
  },
  "intent_ranking": [
    {
      "confidence": 1.0,
      "name": "get_product"
    },
    {
      "confidence": 7.863674884629312e-26,
      "name": "workaround"
    }
  ],
  "message_id": "ba741fb8a4c6437e9cfd87e595d57078",
  "text": "washer and dryer brackets",
  "text_tokens": [
    {"value": "washer", "start": 0, "end": 6},
    {"value": "and", "start": 7, "end": 10},
    {"value": "dryer", "start": 11, "end": 16},
    {"value": "brackets", "start": 17, "end": 25}
  ]
}

Final Rasa Response Log

Log from Rasa Final Response:

{
  "text": "washer and dryer brackets",
  "intent": {
    "name": "get_product",
    "confidence": 1.0
  },
  "entities": [
    {
      "entity": "product",
      "start": 0,
      "end": 6,
      "confidence_entity": 0.9867187142372131,
      "value": "washer brackets",
      "extractor": "DIETClassifier",
      "processors": ["EntitySynonymMapper"]
    },
    {
      "entity": "product",
      "start": 11,
      "end": 25,
      "confidence_entity": 0.9957659244537354,
      "value": "dryer brackets",
      "extractor": "DIETClassifier",
      "processors": ["EntitySynonymMapper"]
    }
  ],
  "text_tokens": [
    [0, 6],
    [7, 10],
    [11, 16],
    [17, 25]
  ],
  "intent_ranking": [
    {
      "name": "get_product",
      "confidence": 1.0
    },
    {
      "name": "workaround",
      "confidence": 7.863674884629312e-26
    }
  ],
  "response_selector": {
    "all_retrieval_intents": [],
    "default": {
      "response": {
        "responses": null,
        "confidence": 0.0,
        "intent_response_key": null,
        "utter_action": "utter_None"
      },
      "ranking": []
    }
  }
}

Custom Component Method

Here’s how I am logging the entities in my custom component:

class EntityTypoFixer(GraphComponent):
    def process(self, messages: List[Message]) -> List[Message]:
        # This is the method which Rasa Open Source will call during inference.
        for message in messages:
            pprint(message.as_dict())

        return messages

Question

Why is there a discrepancy between the entity values logged by my custom component (EntityTypoFixer) and the final response provided by Rasa? How can I ensure consistency between these values?

Note: This only occurs when using entity roles.

Looks like the DIETClassifier is also extracting these entities. You could either disable entity extraction in the DIETClassifier by setting entity_recognition: False or run a custom action to decide whether to use the DIET or your custom entity extractor.

Another option is to drop your custom entity extractor and try using synonyms.