{ “language”: “en”, “pipeline”: “spacy_sklearn”, “data”: { “rasa_nlu_data”: { “common_examples”: [ { “intent”: “generic_intent”, “text”: “I224235S109404”, “entities”: [ { “start”: 0, “end”: 14, “value”: “I224235S109404”, “entity”: “account_number”, “entity_id”: 5 } ] }, { “intent”: “generic_intent”, “text”: “NY”, “entities”: [ { “start”: 0, “end”: 2, “value”: “NY”, “entity”: “city_name”, “entity_id”: 6 } ] }, { “intent”: “generic_intent”, “text”: “11210”, “entities”: [ { “start”: 0, “end”: 5, “value”: “11210”, “entity”: “zipcode”, “entity_id”: 4 } ] }, { “intent”: “generic_intent”, “text”: “11211”, “entities”: [ { “start”: 0, “end”: 5, “value”: “11211”, “entity”: “zipcode”, “entity_id”: 4 } ] }, { “intent”: “greet”, “text”: “hi”, “entities”: [] }, { “intent”: “greet”, “text”: “hello”, “entities”: [] } ], “regex_features”: [ { “name”: “account_number”, “pattern”: “I[a-zA-Z0-9]*[a-zA-Z0-9]{13,14}” }, { “name”: “zipcode”, “pattern”: “[0-9]{5}” } ] } } }
when i give I224235S109405 as input extracting properly
my problem is : when train more sample data for city name (10000 names)
I224235S109404 extracting as city name :
"entities": [
{
"start": 0,
"end": 14,
"value": "i212851s105704",
"entity": "city_name",
"confidence": 0.9774771019143172,
"extractor": "ner_crf"
}
],
please give to solve the issue .