Regex based entity Extraction

SaiSatwik · April 30, 2020, 7:28am

I want to extract entities of two types. One is ‘words’ entity which accepts alphaNumeric (including hiphens and underscores) strings. The regex for ‘words’ entity is [a-zA-Z0-9_\-]*.

Other is ‘multiWords’ entity which accepts sentences and words in double quotes. The regex for ‘multiWords’ entity is "[a-zA-Z0-9_\-][a-zA-Z0-9_\- ]*[a-zA-Z0-9_\-]".

Example: If my sentence is The king shouted “Let the game begin”. Then [The, king, shouted] should be extracted as words and [“Let the game begin”] should be extracted as multiplrWords entity.

This is my config file

Configuration for Rasa NLU.

Components

language: en pipeline:

name: SpacyNLP case_sensitive: true
name: SpacyTokenizer
name: RegexFeaturizer
name: SpacyFeaturizer
name: CRFEntityExtractor
name: “regex.RegexEntityExtractor”
name: EntitySynonymMapper
name: SklearnIntentClassifier

Configuration for Rasa Core.

Policies

policies:

name: MemoizationPolicy
name: MappingPolicy

But I was not able to extract properly with CRFEntityExtractor and RegexEntityExtractor. Can anyone give some suggestions to do this task. Thanks in advance.

Ghostvv · April 30, 2020, 1:23pm

spacy tokenizer probably strips the quotes. For such a simple rules I don’t see the reason to use ML to extract entities. Just create a custom component that would do that

Topic		Replies	Views
Rasa spacy enity extractor Rasa Open Source	1	395	September 1, 2020
Detecting multiple regexes as separate entities Rasa Open Source	2	228	March 9, 2023
Extract alphanumeric entity Rasa Open Source	3	703	October 31, 2018
Extract Long Multi-word Entities Rasa Open Source	7	1844	July 9, 2020
Trouble extracting entities Rasa Open Source	2	392	September 6, 2018

Regex based entity Extraction

Configuration for Rasa NLU.

Components

Configuration for Rasa Core.

Policies

Related topics