Slot unexpectedly filled with an array when an entity is detected / extracted multiple times

I have Rasa 2.0 (from RasaHQ Docker image) set up with the following pipeline in config:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en
pipeline:
  - name: SpacyNLP
    model: "en_core_web_sm"
  - name: SpacyTokenizer
  - name: custom_components.SimpleNameExtractor
  - name: SpacyEntityExtractor
    dimensions: ["PERSON"] #https://spacy.io/api/annotation#section-named-entities
  - name: SpacyFeaturizer
    pooling: mean
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

I’m using Spacy in combo with my own custom component to extract names. My custom component catches the unusual names Spacy doesn’t manage to get. Everything works as expected.

At times both components are able to extract a PERSON entity. This is great confirmation, but what happens is the name slot is filled with an array rather than a single name string. For example:

Input: “My name is Albert”

Gives output:

{
  "text": "My name is Albert",
  "intent": {
    "id": -4074871007728436796,
    "name": "inform",
    "confidence": 0.99996018409729
  },
  "entities": [
    {
      "value": "albert",
      "confidence_entity": 0.6000000000000001,
      "entity": "PERSON",
      "start": 11,
      "end": 17,
      "extractor": "simple_name_extractor"
    },
    {
      "entity": "PERSON",
      "value": "Albert",
      "start": 11,
      "confidence": null,
      "end": 17,
      "extractor": "SpacyEntityExtractor"
    }

...

When I use interactive story builder, it fills my name slot like so:

show_name_form 1.00                                                                                                  
      active_loop{"name": "show_name_form"}                                                                                
      slot{"requested_slot": "name"}                                                                                       
      What's your name? Or nickname if you prefer?                                                                         
      slot{"name": ["albert", "Albert"]}                                                                         
      slot{"requested_slot": null}                                                                                         
      active_loop{"name": null}                                                                                            
      utter_greet_name 1.00                                                                                                
      Nice to meet you ['albert', 'Albert']!

I suppose I am missing a step to determine which entity should be selected when more than one component identifies an entity. I’m not sure where to do this, could someone point me in the right direction?

Edit: For further clarification the name slot is set in form like so:

forms:
  show_name_form:
    name:
    - type: from_entity
      entity: PERSON
      not_intent:
      - greet
      - out_of_scope
      - clarification

I had a read through the code, it looks like this is the logic that fills the slot with an array of entities or single value (or none if none found):

I guess I didn’t expect this. I’m assuming I can override this in my own form?

Success! I ended up adding a custom validation action to select a single value. Ideally it would select the value with the highest confidence score found in tracker.latest_message['entities']

This is first time I’ve created a custom action and a component. Does this method make sense to everyone?

class ValidateShowNameForm(FormValidationAction):
    def name(self) -> Text:
        return "validate_show_name_form"

    def validate_name(
        self,
        slot_value: Any,
        dispatcher: CollectingDispatcher,
        tracker: Tracker,
        domain: DomainDict,
    ) -> Dict[Text, Any]:

        if type(slot_value) is list:
          slot_value = slot_value[-1]
          logger.info("Multiple possible values extracted for name value, using last in pipeline")
          logger.info(json.dumps(tracker.latest_message['entities']))

        return {"name": slot_value}
3 Likes

Hi @digitalWestie. Thank you for sharing your solution here, I am sure it will be very useful for other community members.

1 Like

I used @digitalWestie 's approach and added a piece of code to extract the entity with highest confidence:

        maxConf=0.0
        val = ""
        if type(slot_value) is list:
            for entity in tracker.latest_message['entities']:
                if entity['confidence_entity']>=maxConf:
                    maxConf = entity['confidence_entity']
                    val = entity['value']
            slot_value = val

Hope this helps.

2 Likes

@harshit-sysquo Thanks Although it does not work when one of the extractors is RegexEntityExtractor (lookup tables) which has no confidence values.

Nice one :+1:t3: