Multiple word entity detected as more entities

maja · October 27, 2021, 12:07pm

Rasa version: 2.3.0

Python version: 3.8.10

Operating system: Ubuntu 20.04

Issue:

I trained a model on a entity including multiple words, but when it comes to prediction, the model sometimes separate the words of that entity - few of them belongs to one example of that entity, and few to another example of that entity. Sometimes it also separates it by word, each word representing a example of same entity.

I read issues where this happened to people who are using lookup tables. However, I am not using it. This situation just happens from time to time. Which solution would you recommend?

Example:

Training data: “[Force Majeure Event] (my_event) means any act or event, whether foreseen or unforeseen, that satisfies all of the following criteria.”
Prediction: Force Majeure - my_event , Event - my_event , OR: Force - my_event , Majeure Event - my_event , OR: Force - my_event, Majeure - my_event , Event - my_event

Content of configuration file (config.yml):

language: en
pipeline:
  - name: "WhitespaceTokenizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "word"
  - name: "CountVectorsFeaturizer"
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: "DIETClassifier"
    random_seed: 42
    intent_classification: True
    entity_recognition: False
    epochs: 50
    learning_rate: 0.0002
    embedding_dimension: 60
    number_of_transformer_layers: 1
    batch_size: 64
    hidden_layers_sizes:
      text: [256, 128]
    drop_rate: 0.3
    weight_sparsity: 0.9
- name: "LexicalSyntacticFeaturizer"
    "features": [
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "low",
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "title",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ],
        [
          "prefix5",
          "prefix2",
          "suffix5",
          "suffix3",
          "suffix2",
          "digit",
        ]
      ]
- name: "DIETClassifier"
    random_seed: 42
    intent_classification: False
    entity_recognition: True
    epochs: 200
    learning_rate: 0.0002
    embedding_dimension: 60
    number_of_transformer_layers: 1
    batch_size: 32
    hidden_layers_sizes:
      text: [256, 128]

Topic		Replies	Views
Handling multiple word entity Rasa Open Source	2	926	November 9, 2021
Multi word entities are detected as separate entities from lookup Rasa Open Source	6	1119	January 8, 2020
Detecting multiple regexes as separate entities Rasa Open Source	2	228	March 9, 2023
Only recognizing one entity Rasa Open Source varsha	0	531	March 7, 2019
Multiple words as entity Rasa Open Source	4	2325	December 13, 2021

Multiple word entity detected as more entities

Related topics