Indian name recognition.(name entity recognition) (Regional name recognition) works best in recognition of name

sheggam_harshith · June 2, 2020, 6:54pm

hey @community @tyd @stephens I was looking a best and easy way for name recognition meant purely for Indian names and international names

Existing pipe lines

There are many pipelines that are not so efficient to grab name entity “Spacy” works for us originate names or English names but when it comes to Indian names people type there name which is converted from there native language to English so all the pipelines fails in grabbing name entity

so my solution for the bot to recognise the name(Indian) is to use custom CRFEntityExtractor and SpacyEntityExtractor so basically we will be using the composite of these two entity extractor config

# Configuration for Rasa NLU.

https://rasa.com/docs/rasa/nlu/components/

language: “en”

pipeline:

name: SpacyNLP model: “en_core_web_lg” case_sensitive: False
name: ConveRTTokenizer
name: ConveRTFeaturizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CRFEntityExtractor
name: SpacyEntityExtractor dimensions: [“PERSON”,“ORG”]
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 25
name: EntitySynonymMapper
name: ResponseSelector epochs: 25

Configuration for Rasa Core.

https://rasa.com/docs/rasa/core/policies/

policies:

name: MemoizationPolicy
name: TEDPolicy max_history: 5 epochs: 25
name: MappingPolicy
name: FormPolicy
name: “FallbackPolicy” nlu_threshold: 0.4 core_threshold: 0.3 fallback_action_name: “action_default_fallback”

above is my config file which I used in project but still you will face problem in recognition of names because both the CRF AND SPACY are not meant to grab Indian names

for that you need a good data set for the Indian name entity recognition so this is my nlu.md file basically I used for my project

intent:inform

my name is akshith
my name is Chaitanya
my name is damodar
my name is ekani
my name is fharan
my name is ganesh
my name is hyper
my name is indra
my name is Jaques
my name is Jayanand
my name is Kanta
my name is Laksman
my name is Madhukar
my name is Nagesh
my name is Om
my name is Panduranga
my name is Raju
my name is Swami
my name is bargav he’s
my name is surya and i’m going to have a baby!
my name is harshith
my name is harshith he’s
my name’s rama chandra your is ralph, i think
my name is durga and i’m going to be honest with you
my name’s priya and i hate my a lot, you know?
i have to tell you, my name’s harichandra
my name is ramesh i 'll be right back
my name’s suresh my name’s coldblooded!
my name’s ramchandra
my name is shiva and i’m very much looking to meet you
i have a person_name’s niklesh
my name’s reddy and i have no idea whose house i’m at,
my name is naveen namani
my name’s harshith
my name’s surya
my name’s harshith i’m the one who taught him how to kill

so you will be observing why I have given a data set that has same intent

my name is -----------------

if you see in depth I have given a data set that has followed all alphabetical order starts from

A to Z

my name is akshith
my name is Chaitanya
my name is damodar
my name is ekani
my name is fharan
…

so basically this works with the CRFEntityExtractor so this is used for custom entity extractor so this will basically extract all Indian names with the string my name is … so on

To make sure if the user enter only his name

akshith
baragav
charan
damaodar
farahan
ganesh
jagadesh
kavitha

Give the train data in nlu like this with alphabetical order of random Indian names that should cover A to Z

Stilll not sure with your bot so to boost up your confident level add lookup table with dataset ** names.txt (463.4 KB)

add this file to the data set of nlu.md

increment the probability of more than 18k Indian names (don’t blame what’s the point of deep learning then its our fault that Indian names are originated from naive language if you want develop a deep learning model )

lookup:person_name

 - data/names.txt

ok cool then why we should use "spacy" name entity extractor on note the above one works for only Indian names to make it international's spacy is good at English names so we should us spacyentity extractor

But you will face problem with action server not extracting names its not with the extractor its with the actions.py code

As I early mentioned that we are using two entity extractor spacy , CRF so what if both entity extractor extract the entities then forms actions will return values in list then you will face problem in validating you entity so for that I have a solution if you wan to validate the other than the name slot

class loanForm(FormAction):

def name(self):
    return "loan_form"

@staticmethod
def required_slots(tracker):
    return [
    "person_name",
    "email_id",
    "type_loan",
    "phone_number",
    ]

def validate_phone_number(
    self,
    value: Text,
    dispatcher: CollectingDispatcher,
    tracker: Tracker,
    domain: Dict[Text, Any],) -> Dict[Text, Any]:

    #print(value)

    li = []
    if type(value)==str:
        li.append(value)
    else:
        li = value
    p2 = pattern()
    for value in li:
        if p2.search_phone_number(strign=value):
            return {"phone_number":value}
        else:
            dispatcher.utter_message(text="Thats an incorrect format please enter a valid format")
            return{"phone_number":None}


def validate_person_name(
    self,
    value: Text,
    dispatcher: CollectingDispatcher,
    tracker: Tracker,
    domain: Dict[Text, Any],) -> Dict[Text, Any]:
    li = []
    if type(value)==str:
        li.append(value)
    else:
        li = value
    p3 = pattern()
    for value in li:
        if(p3.search_phone_number('^\\d+$',value)):
            return {"person_name":None}
        else:
            return {"person_name":value}

def validate_email_id(
    self,
    value: Text,
    dispatcher: CollectingDispatcher,
    tracker: Tracker,
    domain: Dict[Text, Any],) -> Dict[Text, Any]:

    print(value)
    li = []
    if type(value)==str:
        li.append(value)
    else:
        li = value
    print(li)
    p = pattern()
    for value in li: 
        print(value)   
        if p.email_id_search(slot_value=value):
            return {"email_id":value}
        else:
            dispatcher.utter_message(text="please enter a valid email id format xyz@xyz.com")
            return{"email_id":None}

def slot_mappings(self):
    return {"person_name":[self.from_entity(entity="person_name", intent="inform"),
                           self.from_entity(entity="PERSON", intent="inform"),
                           self.from_entity(entity="ORG", intent="inform"),
                           self.from_text(intent="inform"), # just to bet sure that if Both entity recognition fails in extracting text is the only slot left with ous 
                           self.from_text(intent="greet"),]}
# this is one for validating the phone and name should not contain purely numbers with regex
class pattern:
def search_phone_number(self,pattern='^(\+\d{1,2}\s?)?1?\-?\.?\s?\(?\d{3}\)?[\s.-]?\d{3}[\s.-]?\d{4}$',strign=None):
    search1 = re.compile(pattern).search(strign)
    if not search1:
        print(False)
        return False
    else:
        print(True)
        return True 

def email_id_search(self,pattern="^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$",slot_value=None):
    search1 = re.compile(pattern).search(slot_value)
    if not search1:
        print(slot_value)
        print(False)
        return False
    else:
        print(True)
        return True

so this my code for validation of entities because you may not with simple string as it return list if both entity are recognised

you may notice that I have taken the intent greet as consider to the person_name slot why the obeservation I made was that most of the south corean name be like "Ching - haee "so the bot classify it has greet with confidence level 0.999 to avoid this I have take the greet into "person slot purely optional" because it take and fills hi into slot so you can remove it
The above method basically worked for me hoping it works with all Indian names you can test the bot over here

http://bigdatamatica.tk

RBenjaminfranklin · June 30, 2020, 1:57pm

hi @sheggam_harshith. nice solution there. I am trying to use the lookup table for the names file you have provided , but it is not recognising it

## intent:name_entry

- [Benjamin](name)

- [Peter](name)

- [Jitendra](name)

- [Sam](name)

- [Ankit](name)

- [John](name)

- [Jiterder Sagar](name)

- [R Benjamin Franklin](name)

- [Ankit Kumar Mishra](name)

- [ankit](name)

- [Daniel](name)

- [Amit](name)

- [Rohit](name)

- my name is [sam](name)

- i am [roshan](name)

- i am [jhonny](name)

- myself [boris johnson](name)

- i call myself [rahul](name)

- I am [Suraj](name)

- this is [neha](name)

- [kajal](name) here

- This is [bigan mehto](name)

- [kiren](name) here

- [seema](name) here

- my name is [farheen](name)

## lookup:name

    data\names.txt

Can you tell what could be the reason.

sheggam_harshith · June 30, 2020, 2:20pm

Can you post your pipelines you have used in your project and see and core nlu output

RBenjaminfranklin · June 30, 2020, 2:32pm

sure @sheggam.

Configuration for Rasa NLU.

Components

language: en

pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer

analyzer: “char_wb”

min_ngram: 1

max_ngram: 4
name: DIETClassifier

epochs: 100
name: EntitySynonymMapper
name: ResponseSelector

epochs: 100

Configuration for Rasa Core.

Policies

policies:

name: MemoizationPolicy
name: TEDPolicy

max_history: 5

epochs: 100
name: MappingPolicy

Output is coming out as Hello None!(it shoudl be “Hello name_of_person”)

sheggam_harshith · July 17, 2020, 5:22am

Configuration for Rasa NLU.

Components

language: “en”

pipeline:

name: SpacyNLP model: “en_core_web_lg” case_sensitive: False
name: “SpacyTokenizer”
name: “SpacyFeaturizer” “pooling”: “mean”
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: DucklingHTTPExtractor url: http://localhost:8000 dimensions:
- email
name: SpacyEntityExtractor dimensions: [“PERSON”, “MONEY”,“ORG”]
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 25
name: EntitySynonymMapper
name: ResponseSelector epochs: 25

policies:

name: MemoizationPolicy
name: TEDPolicy max_history: 5 epochs: 25
name: MappingPolicy
name: FormPolicy
name: “FallbackPolicy” nlu_threshold: 0.4 core_threshold: 0.3 fallback_action_name: “action_default_fallback”

Akhil · September 10, 2020, 2:10am

Hi @sheggam_harshith. Great solution. Thanks.

Any reason you used CRFEntityExtractor + Spacy instead of DIETClassifier + Spacy? Which one is a better custom entity extractor for Indian names, CRF or DIET?

Shan · July 6, 2021, 9:21am

Could you please share your nlu and domain screenshots.

ramakrishna · November 21, 2023, 7:35am

hai @RBenjaminfranklin , i am also facing same issues. if you resolved the issues plz help me

Topic		Replies	Views
Name entity not extracting Rasa Open Source	22	5172	September 17, 2020
Rasa only recognizing names from lookuptable Rasa Open Source	7	3682	June 29, 2020
Requesting help with performing entity recognition containing special characters like (-) Rasa Open Source	3	560	December 20, 2022
Family name extraction Rasa Open Source	23	3504	October 15, 2021
Extracting name entity from text Rasa Open Source	1	3381	October 22, 2019

Indian name recognition.(name entity recognition) (Regional name recognition) works best in recognition of name

https://rasa.com/docs/rasa/nlu/components/

Configuration for Rasa Core.

https://rasa.com/docs/rasa/core/policies/

intent:inform

Stilll not sure with your bot so to boost up your confident level add lookup table with dataset ** names.txt (463.4 KB)

lookup:person_name

ok cool then why we should use "spacy" name entity extractor on note the above one works for only Indian names to make it international's spacy is good at English names so we should us spacyentity extractor

But you will face problem with action server not extracting names its not with the extractor its with the actions.py code

Configuration for Rasa NLU.

Components

Configuration for Rasa Core.

Policies

Configuration for Rasa NLU.

Components

Related topics