How to use OOV to extract any person name?

Rasa Version: 1.10.11

Hi, I am trying to extract a person’s name from a word or sentence and set it into a slot.

I want the bot to handle all the following 5 cases.
example 1:
Bot: Please enter your name?
User: Akhil

example 2:
Bot: Please enter your name?
User: asldfkaskdfh

example 3:
Bot: Please enter your name?
User: My name is Akhil.

example 4:
Bot: Please enter your name?
User: My name is kjasdhfkkjasdf

example 5:
Bot: Please enter your name?
User: It’s asdfasdf

I’ve seen the use of OOV token in Sara bot and implemented in a similar way but I couldn’t extract names for all Indian names.

The output of rasa shell nlu :

Next message:
my name is sai
{
  "intent": {
    "name": "inform",
    "confidence": 0.9997296333312988
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "inform",
      "confidence": 0.9997296333312988
    },
    {
      "name": "claim_status_enquiry",
      "confidence": 0.0002703829959500581
    }
  ],
  "text": "my name is sai"
}
Next message:
my name is linga
{
  "intent": {
    "name": "inform",
    "confidence": 0.9999778270721436
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "inform",
      "confidence": 0.9999778270721436
    },
    {
      "name": "claim_status_enquiry",
      "confidence": 2.2214911950868554e-05
    }
  ],
  "text": "my name is linga"
}
Next message:

I get the following logs while training

(tensorflow) PS O:\Office\Chatbot\HealthCareChatbot\Chatbot\latest_vtest_v2> rasa train --debug
2020-09-02 14:38:38 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\core\healthcare.md' is 'unk'.
2020-09-02 14:38:38 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:38:38 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:38:38 DEBUG    pykwalify.compat  - Using yaml library: c:\users\akhilesh\.conda\envs\tensorflow\lib\site-packages\ruamel\yaml\__init__.py
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:38:41 DEBUG    rasa.model  - Extracted model to 'C:\Users\Akhilesh\AppData\Local\Temp\tmpmsrdo7jr'.
2020-09-02 14:38:42 INFO     rasa.model  - Data (version) for Core model section changed.
2020-09-02 14:38:42 INFO     rasa.model  - Data (version) for NLU model section changed.
Training Core model...
2020-09-02 14:38:53 DEBUG    rasa.core.nlg.generator  - Instantiated NLG to 'TemplatedNaturalLanguageGenerator'.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Generated trackers will be deduplicated based on their unique last 5 states.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Number of augmentation rounds is 3
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting data generation round 0 ... (with 1 trackers)
Processed Story Blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (1 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Data generation rounds finished.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Found 0 unused checkpoints
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting augmentation round 0 ... (with 1 trackers)
Processed Story Blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (2 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting augmentation round 1 ... (with 2 trackers)
Processed Story Blocks: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 969.33it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (4 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting augmentation round 2 ... (with 3 trackers)
Processed Story Blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (6 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Found 6 training trackers.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Subsampled to 5 augmented training trackers.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - There are 1 original trackers.
2020-09-02 14:38:53 DEBUG    rasa.core.agent  - Agent trainer got kwargs: {}
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer(NoneType))...
Processed trackers: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 256.41it/s, # actions=7]
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Created 7 action examples.
Processed actions: 7it [00:00, 87.17it/s, # examples=7]
2020-09-02 14:38:53 DEBUG    rasa.core.policies.memoization  - Memorized 7 unique examples.
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer(LabelTokenizerSingleStateFeaturizer))...
Processed trackers: 100%|████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 82.10it/s, # actions=11]
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Created 11 action examples.
2020-09-02 14:39:09 DEBUG    rasa.utils.tensorflow.models  - Building tensorflow train graph...
2020-09-02 14:39:27 DEBUG    rasa.utils.tensorflow.models  - Finished building tensorflow train graph.
Epochs: 100%|█████████████████████████████████████████████████████████████████████████| 100/100 [00:11<00:00,  8.73it/s, t_loss=0.133, loss=0.060, acc=1.000]
2020-09-02 14:39:39 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-09-02 14:39:39 DEBUG    rasa.core.featurizers  - Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer(NoneType))...
Processed trackers: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 462.03it/s, # actions=7]
2020-09-02 14:39:39 DEBUG    rasa.core.featurizers  - Created 7 action examples.
2020-09-02 14:39:39 DEBUG    rasa.core.policies.memoization  - Memorized 0 unique examples.
2020-09-02 14:39:40 INFO     rasa.core.agent  - Persisted model to 'C:\Users\Akhilesh\AppData\Local\Temp\tmpo7v9uef3\core'
Core model training completed.
Training NLU model...
2020-09-02 14:39:42 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en'
2020-09-02 14:40:14 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en'.
2020-09-02 14:40:14 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:40:14 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Training data stats:
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Number of intent examples: 360 (2 distinct intents)
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  -   Found intents: 'inform', 'claim_status_enquiry'
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Number of entity examples: 323 (3 distinct entities)
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  -   Found entity types: 'npi', 'claim_id', 'name'
2020-09-02 14:40:14 DEBUG    rasa.nlu.training_data.training_data  - Validating training data...
2020-09-02 14:40:14 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component LexicalSyntacticFeaturizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-09-02 14:40:15 DEBUG    rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - No text provided for response attribute in any messages of training data. Skipping training a CountVectorizer for it.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-09-02 14:40:16 DEBUG    rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - No text provided for response attribute in any messages of training data. Skipping training a CountVectorizer for it.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
2020-09-02 14:40:17 DEBUG    rasa.utils.tensorflow.models  - Building tensorflow train graph...
2020-09-02 14:40:45 DEBUG    rasa.utils.tensorflow.models  - Finished building tensorflow train graph.
Epochs: 100%|█████████████████████████████████| 100/100 [01:01<00:00,  1.63it/s, t_loss=0.784, i_loss=0.001, entity_loss=0.004, i_acc=1.000, entity_f1=0.989]
2020-09-02 14:41:47 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-09-02 14:41:47 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:41:47 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2020-09-02 14:41:47 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:41:49 INFO     rasa.nlu.model  - Successfully saved model into 'C:\Users\Akhilesh\AppData\Local\Temp\tmpo7v9uef3\nlu'
NLU model training completed.
Your Rasa model is trained and saved at 'O:\Office\Chatbot\HealthCareChatbot\Chatbot\latest_vtest_v2\models\20200902-144150.tar.gz'.
(
## intent:inform
- My name is [James](name)
- my name is [Leota](name)
- Ok, it is [Minna](name)
- Its [Donette](name)
- It is [Abel](name)
- My name is oov
- my name is oov
- Ok, it is oov
- Its oov
- It is oov
- oov
- [Louis](name)
- [Josephine](name)
- [Lenna](name)
- [Mitsue](name)
- [Sage](name)
- [Kris](name)
- [Kiley](name)
- [Graciela](name)

My config.yml

language: en
pipeline:
  - name: SpacyNLP
    case_sensitive: False
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    OOV_token: oov
    token_pattern: (?u)\b\w+\b
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: MappingPolicy
  - name: FormPolicy

my slot mappings in actions.py

def slot_mappings(self) -> Dict[Text, Union[Dict, List[Dict]]]:
        """A dictionary to map required slots to
            - an extracted entity
            - intent: value pairs
            - a whole message
            or a list of them, where a first match will be picked"""

        return {
            "person_name": [
                self.from_entity(entity="name"),
                self.from_text(intent="inform"),
            ]
        }

@amn41 @MuraliChandran14 @omkarcpatil @flore could you help me?

Hi, @Tanja. I did update to 1.10.11 as you suggested. But the result is the same.

Hello Akhil, make few changes, as per files

nlu.md

keep more utterances in your nlu file, like “my name is omkar”. let the rasa recognize after which sentences “name” entity is appearing.

actions.py

def slot_mappings(self) → Dict[Text, Union[Dict, List[Dict]]]: “”“A dictionary to map required slots to - an extracted entity - intent: value pairs - a whole message or a list of them, where a first match will be picked”“”

    return {
        "person_name": [ self.from_text(intent="inform") ]
    }

def validate_person_name( self, value: Text, dispatcher: CollectingDispatcher, tracker: Tracker, domain: Dict[Text, Any], ) → Dict[Text, Any]: “”“Validate cuisine value.”“” if (tracker.slots[‘name’]): return {“person_name”: value}

this function will check whether there is an entity extracted by rasa nlu then it will take only the entity (name of the person)

try to change these things then your code will work properly.

actionspy file short code for akhil.txt (708 Bytes) i think code in txt file will be readable

Hi @omkarcpatil.

How is the validation function - validate_mak_code is different from self.from_entity(entity="name") ?

This will also check if an entity called “name” is extracted, right?

Please lemme know if my understanding is wrong.

Hello akhil,

i am really sorry to mislead you.

the function name must be “validate_person_name” (validate_<slot_name>)

i am sending new file for the code, please refer that.

actionspy file short code for akhil.txt (768 Bytes)

Hi @omkarcpatil.

I got that. But, I didn’t get why do we need to use a validation function when self.from_entity(entity="name") does the same thing in SlotMappings.

that doesn’t always fetch the entity, check you out of “rasa shell nlu”, rasa got the intent inform with 99% but it couldn’t find entity.