Multiple word entity detected as more entities

Rasa version: 2.3.0

Python version: 3.8.10

Operating system: Ubuntu 20.04

Issue:

I trained a model on a entity including multiple words, but when it comes to prediction, the model sometimes separate the words of that entity - few of them belongs to one example of that entity, and few to another example of that entity. Sometimes it also separates it by word, each word representing a example of same entity.

I read issues where this happened to people who are using lookup tables. However, I am not using it. This situation just happens from time to time. Which solution would you recommend?

Example:

  • Training data: “[Force Majeure Event] (my_event) means any act or event, whether foreseen or unforeseen, that satisfies all of the following criteria.”

  • Prediction: Force Majeure - my_event , Event - my_event , OR: Force - my_event , Majeure Event - my_event , OR: Force - my_event, Majeure - my_event , Event - my_event

Content of configuration file (config.yml):

language: en
pipeline:
  - name: "WhitespaceTokenizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "word"
  - name: "CountVectorsFeaturizer"
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: "DIETClassifier"
    random_seed: 42
    intent_classification: True
    entity_recognition: False
    epochs: 50
    learning_rate: 0.0002
    embedding_dimension: 60
    number_of_transformer_layers: 1
    batch_size: 64
    hidden_layers_sizes:
      text: [256, 128]
    drop_rate: 0.3
    weight_sparsity: 0.9
- name: "LexicalSyntacticFeaturizer"
    "features": [
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "low",
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "title",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ]
      ]
- name: "DIETClassifier"
    random_seed: 42
    intent_classification: False
    entity_recognition: True
    epochs: 200
    learning_rate: 0.0002
    embedding_dimension: 60
    number_of_transformer_layers: 1
    batch_size: 32
    hidden_layers_sizes:
      text: [256, 128]
1 Like