NLU and pipeline for entities


I am using Rasa Open Source version 3.6.16.

Trying to enhance entity recognition.

I have a problem where entities that are not in the NLU examples will not be recognized, or they will be confused with another entity although the examples are different.

(for example, if a company name: deal_with is not in the training data then depending on pipeline it will not extract it or confuse it with deal_name)

Can anyone help with methods to better extract the entities please ? I am not very familiar with NLU so any advice will be helpful.

Thank you.


  - intent: deal_with
    examples: |
      - [Deal]{"entity": "deal", "value": "Yes"} will be with [Google]{"entity": "deal_with"}.
      - [planning a deal]{"entity": "deal", "value": "Yes"} with [Microsoft]{"entity": "deal_with"} platform.
      - making a [deal]{"entity": "deal", "value": "Yes"} with [Twitter]{"entity": "deal_with"}.
      - the [deal]{"entity": "deal", "value": "Yes"} is with [linkedin]{"entity": "deal_with"}
      - the [deal]{"entity": "deal", "value": "Yes"} is to be with [Facebook]{"entity": "deal_with"}

  - intent: deal_name
    examples: |
      - The [deal]{"entity": "deal", "value": "Yes"} name will be [name]{"entity": "deal_name"}.
      - Consider [name]{"entity": "deal_name"} as the name for our [deal]{"entity": "deal", "value": "Yes"}.
      - The name of the [deal]{"entity": "deal", "value": "Yes"} will be [name]{"entity": "deal_name"} 
      - We'll give the name [name]{"entity": "deal_name"} for the [deal]{"entity": "deal", "value": "Yes"} access.

  - name: "SpacyNLP"
    model: "en_core_web_md"
      # case_sensitive: False
  - name: "SpacyTokenizer"   
  - name: "SpacyFeaturizer"
    "pooling": "mean" # or max
  - name: RegexFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: "CRFEntityExtractor"
  # - name: "SpacyEntityExtractor"
    # dimensions to extract
    # dimensions: ["PERSON", "LOC", "ORG", "PRODUCT"]
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

There’s a good blog post on entity extraction here