Entity Extraction for Phrases with Commas

ajgreen630 · August 17, 2021, 4:57pm

Is there a way to handle extracting entities for phrases with commas other than regex entity extraction? I have a lookup table with a list of values (plenty of training examples for each of them), and a few of those values contain phrases with a comma in the middle of them, and I need those phrases to be preserved as entered when the entity is extracted (for the purposes of passing information to query a database).

For example, if I have a few company names in my lookup table:

lookup: examples: |
- Cool Tech
- Corporation Inc.
- Business Name, LLC
- Business Name, LLC (subdivision ab)

I want to be able to extract “Business Name, LLC” as a single entity rather than extracting “Business Name” and “LLC” as two separate entities with the same entity label.

koaning · August 23, 2021, 12:05pm

Have you seen the use_word_boundaries setting in the docs? It sounds like you can configure the entity extractor that way.

ajgreen630 · August 23, 2021, 2:53pm

Thank you for the reply @koaning. I set use_word_boundaries to True for the DIETClassifier in my pipeline, but that didn’t seem to improve the accuracy for my entity extraction. My pipeline is as follows:

pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: DIETClassifier epochs: 100 use_word_boundaries: True

koaning · August 24, 2021, 9:41am

I could be wrong, but by setting the word boundary to “false” you’ll be able to detect “Business Name” as a single entity but this will also cause “Business Name, LLC” to be detected as a single entity.

I’m wondering, would it perhaps make sense to have one Regex extractor work across words and another one for detecting terms like “LLC”?

ajgreen630 · August 24, 2021, 1:22pm

I can see that approach working, but I am also concerned if the user wants to query multiple companies with the same prefix, e.g., “Business Name, Business Name, LLC and Business Name (subdivision ab).” I’d imagine there would be a lot of overhead in my Custom Actions when parsing these entities to send to another endpoint to make sure all entities are included in the query, correct?

koaning · August 27, 2021, 11:51am

It depends a bit on how you fetch the entities, yeah. A spaCy model may help out here, but it may be overkill to finetune your own model for this task. There’s an online demo for their small models here, their lg-large models tend to perform a fair bit better.

Topic		Replies	Views
Multiple Entity Detection Problem Rasa Open Source	3	666	May 25, 2020
Question about entity extraction on lookup table Rasa Open Source	4	792	June 24, 2019
Entity extraction regexentityextractoe Rasa Open Source	5	345	December 1, 2020
Regex based entity Extraction Rasa Open Source	1	1026	April 30, 2020
Has anyone successfully implemented strict regex patterns for entity extraction? Rasa Open Source	1	251	July 3, 2023

Entity Extraction for Phrases with Commas

Related topics