I want to extract entities of two types. One is ‘words’ entity which accepts alphaNumeric (including hiphens and underscores) strings. The regex for ‘words’ entity is [a-zA-Z0-9_\-]*
.
Other is ‘multiWords’ entity which accepts sentences and words in double quotes. The regex for ‘multiWords’ entity is "[a-zA-Z0-9_\-][a-zA-Z0-9_\- ]*[a-zA-Z0-9_\-]"
.
Example: If my sentence is The king shouted “Let the game begin”. Then [The, king, shouted] should be extracted as words and [“Let the game begin”] should be extracted as multiplrWords entity.
This is my config file
Configuration for Rasa NLU.
Components
language: en pipeline:
- name: SpacyNLP case_sensitive: true
- name: SpacyTokenizer
- name: RegexFeaturizer
- name: SpacyFeaturizer
- name: CRFEntityExtractor
- name: “regex.RegexEntityExtractor”
- name: EntitySynonymMapper
- name: SklearnIntentClassifier
Configuration for Rasa Core.
Policies
policies:
- name: MemoizationPolicy
- name: MappingPolicy
But I was not able to extract properly with CRFEntityExtractor and RegexEntityExtractor. Can anyone give some suggestions to do this task. Thanks in advance.