How to use OOV to extract any person name?

Akhil · September 1, 2020, 12:53pm

Rasa Version: 1.10.11

Hi, I am trying to extract a person’s name from a word or sentence and set it into a slot.

I want the bot to handle all the following 5 cases.
example 1:
Bot: Please enter your name?
User: Akhil

example 2:
Bot: Please enter your name?
User: asldfkaskdfh

example 3:
Bot: Please enter your name?
User: My name is Akhil.

example 4:
Bot: Please enter your name?
User: My name is kjasdhfkkjasdf

example 5:
Bot: Please enter your name?
User: It’s asdfasdf

I’ve seen the use of OOV token in Sara bot and implemented in a similar way but I couldn’t extract names for all Indian names.

The output of rasa shell nlu :

Next message:
my name is sai
{
  "intent": {
    "name": "inform",
    "confidence": 0.9997296333312988
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "inform",
      "confidence": 0.9997296333312988
    },
    {
      "name": "claim_status_enquiry",
      "confidence": 0.0002703829959500581
    }
  ],
  "text": "my name is sai"
}
Next message:
my name is linga
{
  "intent": {
    "name": "inform",
    "confidence": 0.9999778270721436
  },
  "entities": [],
  "intent_ranking": [
    {
      "name": "inform",
      "confidence": 0.9999778270721436
    },
    {
      "name": "claim_status_enquiry",
      "confidence": 2.2214911950868554e-05
    }
  ],
  "text": "my name is linga"
}
Next message:

I get the following logs while training

(tensorflow) PS O:\Office\Chatbot\HealthCareChatbot\Chatbot\latest_vtest_v2> rasa train --debug
2020-09-02 14:38:38 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\core\healthcare.md' is 'unk'.
2020-09-02 14:38:38 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:38:38 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:38:38 DEBUG    pykwalify.compat  - Using yaml library: c:\users\akhilesh\.conda\envs\tensorflow\lib\site-packages\ruamel\yaml\__init__.py
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:38:39 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:38:41 DEBUG    rasa.model  - Extracted model to 'C:\Users\Akhilesh\AppData\Local\Temp\tmpmsrdo7jr'.
2020-09-02 14:38:42 INFO     rasa.model  - Data (version) for Core model section changed.
2020-09-02 14:38:42 INFO     rasa.model  - Data (version) for NLU model section changed.
Training Core model...
2020-09-02 14:38:53 DEBUG    rasa.core.nlg.generator  - Instantiated NLG to 'TemplatedNaturalLanguageGenerator'.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Generated trackers will be deduplicated based on their unique last 5 states.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Number of augmentation rounds is 3
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting data generation round 0 ... (with 1 trackers)
Processed Story Blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (1 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Data generation rounds finished.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Found 0 unused checkpoints
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting augmentation round 0 ... (with 1 trackers)
Processed Story Blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (2 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting augmentation round 1 ... (with 2 trackers)
Processed Story Blocks: 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 969.33it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (4 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Starting augmentation round 2 ... (with 3 trackers)
Processed Story Blocks: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<?, ?it/s, # trackers=1]
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Finished phase (6 training samples found).
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Found 6 training trackers.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - Subsampled to 5 augmented training trackers.
2020-09-02 14:38:53 DEBUG    rasa.core.training.generator  - There are 1 original trackers.
2020-09-02 14:38:53 DEBUG    rasa.core.agent  - Agent trainer got kwargs: {}
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer(NoneType))...
Processed trackers: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 256.41it/s, # actions=7]
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Created 7 action examples.
Processed actions: 7it [00:00, 87.17it/s, # examples=7]
2020-09-02 14:38:53 DEBUG    rasa.core.policies.memoization  - Memorized 7 unique examples.
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer(LabelTokenizerSingleStateFeaturizer))...
Processed trackers: 100%|████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 82.10it/s, # actions=11]
2020-09-02 14:38:53 DEBUG    rasa.core.featurizers  - Created 11 action examples.
2020-09-02 14:39:09 DEBUG    rasa.utils.tensorflow.models  - Building tensorflow train graph...
2020-09-02 14:39:27 DEBUG    rasa.utils.tensorflow.models  - Finished building tensorflow train graph.
Epochs: 100%|█████████████████████████████████████████████████████████████████████████| 100/100 [00:11<00:00,  8.73it/s, t_loss=0.133, loss=0.060, acc=1.000]
2020-09-02 14:39:39 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-09-02 14:39:39 DEBUG    rasa.core.featurizers  - Creating states and action examples from collected trackers (by MaxHistoryTrackerFeaturizer(NoneType))...
Processed trackers: 100%|████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 462.03it/s, # actions=7]
2020-09-02 14:39:39 DEBUG    rasa.core.featurizers  - Created 7 action examples.
2020-09-02 14:39:39 DEBUG    rasa.core.policies.memoization  - Memorized 0 unique examples.
2020-09-02 14:39:40 INFO     rasa.core.agent  - Persisted model to 'C:\Users\Akhilesh\AppData\Local\Temp\tmpo7v9uef3\core'
Core model training completed.
Training NLU model...
2020-09-02 14:39:42 INFO     rasa.nlu.utils.spacy_utils  - Trying to load spacy model with name 'en'
2020-09-02 14:40:14 INFO     rasa.nlu.components  - Added 'SpacyNLP' to component cache. Key 'SpacyNLP-en'.
2020-09-02 14:40:14 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\healthcare.md' is 'md'.
2020-09-02 14:40:14 DEBUG    rasa.nlu.training_data.loading  - Training data format of 'data\nlu\inform.md' is 'md'.
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Training data stats:
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Number of intent examples: 360 (2 distinct intents)
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  -   Found intents: 'inform', 'claim_status_enquiry'
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Number of response examples: 0 (0 distinct responses)
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  - Number of entity examples: 323 (3 distinct entities)
2020-09-02 14:40:14 INFO     rasa.nlu.training_data.training_data  -   Found entity types: 'npi', 'claim_id', 'name'
2020-09-02 14:40:14 DEBUG    rasa.nlu.training_data.training_data  - Validating training data...
2020-09-02 14:40:14 INFO     rasa.nlu.model  - Starting to train component SpacyNLP
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component SpacyTokenizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component SpacyFeaturizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component RegexFeaturizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component LexicalSyntacticFeaturizer
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:15 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-09-02 14:40:15 DEBUG    rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - No text provided for response attribute in any messages of training data. Skipping training a CountVectorizer for it.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Starting to train component CountVectorsFeaturizer
2020-09-02 14:40:16 DEBUG    rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer  - No text provided for response attribute in any messages of training data. Skipping training a CountVectorizer for it.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:40:16 INFO     rasa.nlu.model  - Starting to train component DIETClassifier
2020-09-02 14:40:17 DEBUG    rasa.utils.tensorflow.models  - Building tensorflow train graph...
2020-09-02 14:40:45 DEBUG    rasa.utils.tensorflow.models  - Finished building tensorflow train graph.
Epochs: 100%|█████████████████████████████████| 100/100 [01:01<00:00,  1.63it/s, t_loss=0.784, i_loss=0.001, entity_loss=0.004, i_acc=1.000, entity_f1=0.989]
2020-09-02 14:41:47 INFO     rasa.utils.tensorflow.models  - Finished training.
2020-09-02 14:41:47 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:41:47 INFO     rasa.nlu.model  - Starting to train component EntitySynonymMapper
2020-09-02 14:41:47 INFO     rasa.nlu.model  - Finished training component.
2020-09-02 14:41:49 INFO     rasa.nlu.model  - Successfully saved model into 'C:\Users\Akhilesh\AppData\Local\Temp\tmpo7v9uef3\nlu'
NLU model training completed.
Your Rasa model is trained and saved at 'O:\Office\Chatbot\HealthCareChatbot\Chatbot\latest_vtest_v2\models\20200902-144150.tar.gz'.
(

## intent:inform
- My name is [James](name)
- my name is [Leota](name)
- Ok, it is [Minna](name)
- Its [Donette](name)
- It is [Abel](name)
- My name is oov
- my name is oov
- Ok, it is oov
- Its oov
- It is oov
- oov
- [Louis](name)
- [Josephine](name)
- [Lenna](name)
- [Mitsue](name)
- [Sage](name)
- [Kris](name)
- [Kiley](name)
- [Graciela](name)

My config.yml

language: en
pipeline:
  - name: SpacyNLP
    case_sensitive: False
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    OOV_token: oov
    token_pattern: (?u)\b\w+\b
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: MappingPolicy
  - name: FormPolicy

my slot mappings in actions.py

def slot_mappings(self) -> Dict[Text, Union[Dict, List[Dict]]]:
        """A dictionary to map required slots to
            - an extracted entity
            - intent: value pairs
            - a whole message
            or a list of them, where a first match will be picked"""

        return {
            "person_name": [
                self.from_entity(entity="name"),
                self.from_text(intent="inform"),
            ]
        }

Akhil · September 1, 2020, 12:59pm

@amn41 @MuraliChandran14 @omkarcpatil @flore could you help me?

Akhil · September 2, 2020, 9:26am

Hi, @Tanja. I did update to 1.10.11 as you suggested. But the result is the same.

omkarcpatil · September 7, 2020, 6:35am

Hello Akhil, make few changes, as per files

nlu.md

keep more utterances in your nlu file, like “my name is omkar”. let the rasa recognize after which sentences “name” entity is appearing.

actions.py

def slot_mappings(self) → Dict[Text, Union[Dict, List[Dict]]]: “”“A dictionary to map required slots to - an extracted entity - intent: value pairs - a whole message or a list of them, where a first match will be picked”“”

    return {
        "person_name": [ self.from_text(intent="inform") ]
    }

def validate_person_name( self, value: Text, dispatcher: CollectingDispatcher, tracker: Tracker, domain: Dict[Text, Any], ) → Dict[Text, Any]: “”“Validate cuisine value.”“” if (tracker.slots[‘name’]): return {“person_name”: value}

this function will check whether there is an entity extracted by rasa nlu then it will take only the entity (name of the person)

try to change these things then your code will work properly.

actionspy file short code for akhil.txt (708 Bytes) i think code in txt file will be readable

Akhil · September 7, 2020, 8:10am

Hi @omkarcpatil.

How is the validation function - validate_mak_code is different from self.from_entity(entity="name") ?

This will also check if an entity called “name” is extracted, right?

Please lemme know if my understanding is wrong.

omkarcpatil · September 7, 2020, 8:39am

Hello akhil,

i am really sorry to mislead you.

the function name must be “validate_person_name” (validate_<slot_name>)

i am sending new file for the code, please refer that.

actionspy file short code for akhil.txt (768 Bytes)

Akhil · September 7, 2020, 10:47am

Hi @omkarcpatil.

I got that. But, I didn’t get why do we need to use a validation function when self.from_entity(entity="name") does the same thing in SlotMappings.

omkarcpatil · September 8, 2020, 5:32am

that doesn’t always fetch the entity, check you out of “rasa shell nlu”, rasa got the intent inform with 99% but it couldn’t find entity.

Topic		Replies	Views
Rasa does not extract person names in cyrillic Rasa Open Source	10	548	May 4, 2020
How to design Rasa NLU training data for extracting human name Rasa Open Source testing	12	927	March 17, 2023
Extracting Names Rasa Open Source	12	1298	May 28, 2021
Extracting name entity from text Rasa Open Source	1	3381	October 22, 2019
How to setup basic PERSON extraction in English and then include it in a utterance as a variable Getting Started with Rasa	9	200	July 28, 2020

How to use OOV to extract any person name?

this function will check whether there is an entity extracted by rasa nlu then it will take only the entity (name of the person)

Related topics