Hi,
I am using Rasa Open Source version 3.6.16.
Trying to enhance entity recognition.
I have a problem where entities that are not in the NLU examples will not be recognized, or they will be confused with another entity although the examples are different.
(for example, if a company name: deal_with is not in the training data then depending on pipeline it will not extract it or confuse it with deal_name)
Can anyone help with methods to better extract the entities please ? I am not very familiar with NLU so any advice will be helpful.
Thank you.
NLU:
- intent: deal_with
examples: |
- [Deal]{"entity": "deal", "value": "Yes"} will be with [Google]{"entity": "deal_with"}.
- [planning a deal]{"entity": "deal", "value": "Yes"} with [Microsoft]{"entity": "deal_with"} platform.
- making a [deal]{"entity": "deal", "value": "Yes"} with [Twitter]{"entity": "deal_with"}.
- the [deal]{"entity": "deal", "value": "Yes"} is with [linkedin]{"entity": "deal_with"}
- the [deal]{"entity": "deal", "value": "Yes"} is to be with [Facebook]{"entity": "deal_with"}
- intent: deal_name
examples: |
- The [deal]{"entity": "deal", "value": "Yes"} name will be [name]{"entity": "deal_name"}.
- Consider [name]{"entity": "deal_name"} as the name for our [deal]{"entity": "deal", "value": "Yes"}.
- The name of the [deal]{"entity": "deal", "value": "Yes"} will be [name]{"entity": "deal_name"}
- We'll give the name [name]{"entity": "deal_name"} for the [deal]{"entity": "deal", "value": "Yes"} access.
pipeline:
- name: "SpacyNLP"
model: "en_core_web_md"
# case_sensitive: False
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
"pooling": "mean" # or max
- name: RegexFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: "CRFEntityExtractor"
# - name: "SpacyEntityExtractor"
# dimensions to extract
# dimensions: ["PERSON", "LOC", "ORG", "PRODUCT"]
- name: DIETClassifier
epochs: 100
constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
constrain_similarities: true
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.1