Hello
I am trying to specify a list of neighborhoods in a lookup file.
The entities detected are sometimes merged into one longer string of multiple neighborhoods.
When I send this message to the parse endpoint:
{
"text": "any place in New York except for Bronx Harlem Queens Stapleton"
}
I get back these entities:
{
"entity": "PERSON",
"value": "Harlem",
"start": 39,
"confidence": null,
"end": 45,
"extractor": "SpacyEntityExtractor"
},
{
"entity": "PERSON",
"value": "Stapleton",
"start": 53,
"confidence": null,
"end": 62,
"extractor": "SpacyEntityExtractor"
},
{
"start": 13,
"end": 21,
"value": "New York",
"entity": "location",
"confidence": 0.9529347316,
"extractor": "CRFEntityExtractor"
},
{
"start": 33,
"end": 62,
"value": "Bronx Harlem Queens Stapleton",
"entity": "location",
"confidence": 0.9446223897,
"extractor": "CRFEntityExtractor"
},
...
If I add commas between them everything is ok. But we can’t rely on our users doing that. Is there any way to tell rasa to split entities when matched against the lookup tables?
Adding @nbeuchat