How to setup basic PERSON extraction in English and then include it in a utterance as a variable

Hi

I am experimenting with Rasa and wanted to create a minimal setup to extract person name and use it in a utterance. Kind of “Hi” → “Hi. What’s your name?” → “Paweł” → “Nice to see you, Paweł.” Sadly, looks like I lack basic knowledge because after reading whole Rasa Open Source docs I still don’t know how to do it :frowning_face:

When I talk chat with my chatbot in shell it does not substitute {PERSON} with john:

Your input ->  hi
Hi!
What's your name?
Your input ->  my name is John
/Users/pawelbarszcz/Developer/Virbe/research-playground-chatbot-engine/rasa/venv/lib/python3.7/site-packages/rasa/utils/common.py:351: UserWarning: Interpreter parsed an entity 'PERSON' which is not defined in the domain. Please make sure all entities are listed in the domain.
  More info at https://rasa.com/docs/rasa/core/domains/
Nice to meet you, {PERSON} 🙂

My current chatbot setup is as follows:

data/nlu.md

## intent:i_greet
- Hi
- Hey

## intent:i_my_name_is
- My name is [John](PERSON)
- I'm [John](PERSON)

data/stories.md

## greet + ask for name
* i_greet
    - utter_greet
    - utter_whats_your_name
* i_my_name_is
    - utter_nice_to_meet_you_name

tests/conversation_tests.md

## greet + ask for name
* i_greet: Hi
    - utter_greet
    - utter_whats_your_name
* i_my_name_is: My name is [John](PERSON)
    - utter_nice_to_meet_you_name

domain.yml

session_config:
  session_expiration_time: 60
  carry_over_slots_to_new_session: false

intents:
  - i_greet:
      use_entities: []
      ignore_entities: []
  - i_my_name_is:
      use_entities:
        - PERSON
      ignore_entities: []

responses:
  utter_greet:
    - text: "Hello!"
    - text: "Hi!"
  utter_whats_your_name:
    - text: "What's your name?"
  utter_nice_to_meet_you_name:
    - text: "Nice to meet you, {PERSON} 🙂"

config.yml

(definitely `policies looks unnecessarily complex here, they are leftover of my other example focused on graph-based conversations)

language: en
pipeline:
  - name: "SpacyNLP"
    model: "en_core_web_md"
    case_sensitive: False
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "RegexFeaturizer"
  - name: "SpacyEntityExtractor"
    dimensions: ["PERSON"]
  - name: "EntitySynonymMapper"
  - name: "SklearnIntentClassifier"

policies:
  - name: "AugmentedMemoizationPolicy"
    max_history: 10
  - name: "TEDPolicy"
    featurizer:
      - name: "MaxHistoryTrackerFeaturizer"
        max_history: 10
        state_featurizer:
          - name: "BinarySingleStateFeaturizer"
  - name: "MappingPolicy"

FYI When I added

entities:
  - PERSON

to domain.yml it ended with tests failing.

failed_stories.md:

## greet + ask for name (/var/folders/x3/81nczz_5387b16lxy9ycsvdh0000gn/T/tmp_5lnbtns/b5956441149c4583b56bdee3af1de47d_conversation_tests.md)
* i_greet: Hi
    - utter_greet
    - utter_whats_your_name
* i_my_name_is: My name is [John](PERSON)
    - utter_nice_to_meet_you_name   <!-- predicted: utter_whats_your_name -->
    - action_listen   <!-- predicted: utter_whats_your_name -->

Then chatting in shell goes as follows:

Your input ->  hi
Hello!
What's your name?
Your input ->  my name is John
What's your name?

Hello Pawel, welcome to the forum!

What do you get if you run rasa shell nlu and type “my name is John”?

(did it with both shell nlu and GET /model/parse and I deduct it is same functionality under the hood)

Without entity PERSON in domain and tests passing:

{
  "intent": {
    "name": "i_my_name_is",
    "confidence": 0.8533644285063952
  },
  "entities": [
    {
      "entity": "PERSON",
      "value": "John",
      "start": 11,
      "confidence": null,
      "end": 15,
      "extractor": "SpacyEntityExtractor"
    }
  ],
  "intent_ranking": [
    {
      "name": "i_my_name_is",
      "confidence": 0.8533644285063952
    },
    {
      "name": "i_greet",
      "confidence": 0.14663557149360515
    }
  ],
  "text": "my name is John"
}

With entity PERSON in domain (and tests failing) results looks identical at the first sight:

{
  "intent": {
    "name": "i_my_name_is",
    "confidence": 0.8533644285063952
  },
  "entities": [
    {
      "entity": "PERSON",
      "value": "John",
      "start": 11,
      "confidence": null,
      "end": 15,
      "extractor": "SpacyEntityExtractor"
    }
  ],
  "intent_ranking": [
    {
      "name": "i_my_name_is",
      "confidence": 0.8533644285063952
    },
    {
      "name": "i_greet",
      "confidence": 0.14663557149360515
    }
  ],
  "text": "my name is John"
}

As I guess based on output above – entity is recognised properly (at least for passing exactly same name as in nlu.md data), but somehow it’s not injected into response utterance.

Oh, right. You need to have both, a slot, and an entity that the slot can be filled with. If the slot name and the entity name are the same, this is done automatically. See Slots . So you should have both

slots:
  PERSON:
    type: text
    initial_value: "human"

and the entities fields in your domain file.

1 Like

1. Slot defined, same config

With PERSON in both entities and slots tests are still failing (with new output below) and there is no answer from bot (see below too)

failed_stories.md

## greet + ask for name (/var/folders/x3/81nczz_5387b16lxy9ycsvdh0000gn/T/tmpsze_x_j1/a049d505f524449fa04bc9a57c34ca8a_conversation_tests.md)
* i_greet: Hi
    - utter_greet
    - utter_whats_your_name
* i_my_name_is: My name is [John](PERSON)
    - slot{"PERSON": "John"}
    - utter_nice_to_meet_you_name   <!-- predicted: action_listen -->

conversation

Your input ->  hi
Hi!
What's your name?
Your input ->  my name is John
Your input ->

2. Slot defined, now with another config

With both entity and slot defined I also tried with another policies:

policies:
  - name: "MemoizationPolicy"
  - name: "TEDPolicy"
    max_history: 5
    epochs: 100
  - name: "MappingPolicy"

Those are ones from yet another example I was working with.

Not it works for my name is John:

Your input ->  hi
Hello!
What's your name?
Your input ->  my name is John
Nice to meet you, John 🙂

Thanks, @j.mosig :tada:

3. Further questions

As the basic case (exact user message and PERSON value as in nlu.md) works now I will continue with exploration and will write down some questions to understand things better. But in a separate reply in this thread (have to go AFK for now).

OK, so here are my questions, @j.mosig

  1. Is it possible to make it work for non-english characters? My name Paweł results with Nice to meet you, human 🙂, but for Pawel it’s correctly Nice to meet you, Pawel 🙂 . I think about case where user is english-speaking but has non-english name.

  2. How can I know more about entities provided by an extractor? For SpacyEntityExtractor I see PERSON, LOC, ORG, and PRODUCT in docs, but how can I know what they mean (apart from guessing that LOC might be kind of “location”) are what are their types (you helped me, @j.mosig, with type: text, but how did you know that PERSON is of type text?)

  3. I see my bot works for names different than John/hohn. Is it possible test it? I tried with - slot{…} notation but tests are passing even with different value next to intent than the on in -slot{…}. See below (beware there is one extra utter_nice_to_meet_you_name in greeting comparing to my previous snippets, as I wanted to test slot initial value):

## greet John
* i_greet: Hi
    - utter_greet
    - utter_nice_to_meet_you_name
    - utter_whats_your_name
* i_my_name_is: My name is [John](PERSON)
    - slot{"PERSON": "Mike"}
    - utter_nice_to_meet_you_name
  1. I struggle to achieve slot name different than entity (I prefer to name things of different concepts differently, as it makes it easier to understand what and how and why :slightly_smiling_face: ) See my trial below, where tests, sadly, pass even if I don’t know how to pass recognised entity to a given slot:
#
# excerpt from domain.yml
#
entities:
  - PERSON

slots:
  s_person:
    type: text
    initial_value: "human"

responses:
  # …
  utter_nice_to_meet_you_name:
    - text: "Nice to meet you, {s_person} 🙂"

#
# excerpt from nlu.md
#

## intent:i_my_name_is
- My name is [John](PERSON)

#
# excerpt from tests
#

## greet John
* i_greet: Hi
    - utter_greet
    - utter_nice_to_meet_you_name
    - utter_whats_your_name
* i_my_name_is: My name is [John](PERSON)
    - utter_nice_to_meet_you_name

Regarding my 4th question I found a solution. Looks hacky, but works. Is it the simplest one possible?

Details below:

#
# whole stories.md
#

## greet + ask for name
* i_greet
    - utter_greet
    - utter_nice_to_meet_you_name
    - utter_whats_your_name
* i_my_name_is
    - f_ask_for_person_name
    - utter_nice_to_meet_you_name

#
# excerpt from conversation_tests.md
#

## greet John
* i_greet: Hi
    - utter_greet
    - utter_nice_to_meet_you_name
    - utter_whats_your_name
* i_my_name_is: My name is [John](PERSON)
    - f_ask_for_person_name
    - utter_nice_to_meet_you_name

#
# excerpts from domain.yml
#

forms:
  - f_ask_for_person_name

entities:
  - PERSON

slots:
  s_person_name:
    type: text
    initial_value: "human"
    auto_fill: false

intents:
  - i_my_name_is:
      use_entities:
        - PERSON
      ignore_entities: []

responses:
  utter_nice_to_meet_you_name:
    - text: "Nice to meet you, {s_person_name} 🙂"

#
# excerpt from config.yml
#

policies:
  - name: "MemoizationPolicy"
  - name: "TEDPolicy"
    max_history: 5
    epochs: 100
  - name: "MappingPolicy"
  - name: "FormPolicy"

#
# whole actions.py
#

from typing import Any, Text, Dict, List, Union

from rasa_sdk import Tracker
from rasa_sdk.executor import CollectingDispatcher
from rasa_sdk.forms import FormAction


class FormAskForPersonName(FormAction):

    def name(self) -> Text:
        return "f_ask_for_person_name"

    @staticmethod
    def required_slots(tracker) -> List[Text]:
        return ["s_person_name"]

    def slot_mappings(self) -> Dict[Text, Union[Dict, List[Dict[Text, Any]]]]:
        return {
            "s_person_name": [self.from_entity(entity="PERSON")]
        }

    def submit(self, dispatcher: CollectingDispatcher, tracker: Tracker, domain: Dict[Text, Any]) -> List[Dict]:
        return []

and sample chat goes as follows:

Your input ->  hi
Hi!
Nice to meet you, human 🙂
What's your name?
Your input ->  My name? It's Mike Smith.
Nice to meet you, Mike Smith 🙂

Hi @pawelbarszcz

Sorry for the late reply, I was on vacation.

Is it possible to make it work for non-english characters?

Yes, this should work if you add a few examples of names with non-latin characters. You can use synonymes for that (see https://rasa.com/docs/rasa/nlu/training-data-format/#id3)

How can I know more about entities provided by an extractor?

If you use an external library, such as SpaCy, for entity extraction, then you need to refer to their documentation. For example, see https://spacy.io/api/annotation#section-named-entities for the spaCy entities.

I see my bot works for names different than John / hohn . Is it possible test it?

Your training stories should only contain correct examples. You can then test generalization with test stories (see, e.g. Testing Your Assistant). I am not sure what you mean, though.

I struggle to achieve slot name different than entity

Good point, but the easiest way is indeed to give them identical names. If you name slots different from entities, then you have two options: (i) you change your stories such that whenever the user provides the corresponding entity, the next line is - slot{"s_person": ...}, or (ii) you use a Form and define how slots are extracted (see Forms).

I hope this helps! :slightly_smiling_face:

1 Like

Thanks for your answers @j.mosig !