Match synonym before detecting entities

I am working on a feature where a synonym could contain multiple entities. Depending on the context, those entities should be omitted in favor of the synonym mapping.

One example:

My goal is to identify the streaming type and the platform of the user.

A: I can watch [Live TV](type) // Does not work, Split in [Live](type) and [TV](platform)
B: I am watching football [live](type) on my [tv](platform) // Does work
{
  "text": "I like to watch Live TV", // 'Live TV' synonym should be matched
  "intent": {
    "id": 3847976001285428029,
    "name": "inform",
    "confidence": 0.9777395129203796
  },
  "entities": [
    {
      "entity": "type",
      "start": 16,
      "end": 20,
      "confidence_entity": 0.9307569265365601,
      "value": "Live",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "platform",
      "start": 21,
      "end": 23,
      "confidence_entity": 0.5813954472541809,
      "value": "tv",
      "extractor": "DIETClassifier",
      "processors": [
        "EntitySynonymMapper"
      ]
    }
}

As we see the EntitySynonymMapper triggers for “tv” and identifies “Live” as an entity. I want the EntitySynonymMapper to recognize “Live TV” as a synonym for video_content_type “live” and not set the platform entity at all. Is this possible?

Other files:

# ... domain.yml
entities:
  - type
  - platform

# ... nlu.yml
- intent: inform
  examples: |
    - I can not play [Live-TV](type)
    - I am watching football [live](type) on my [tv](platform)
    - I like to watch [Live TV](type)
    - I love to watch tennis on my [tv](platform)
    - The [live](type) feature is the best
    - [LiveTv](type) is  not working on my [tv](platform)

- synonym: live
  examples: |
    - Livestream
    - Live TV
    - Livetv
    - LiveTV
    - Live-TV
    - Lievestream
    - Live-Streaming
    - Livestreams
- synonym: tv
  examples: |
    - TV
    - Fernseher
    - Fernsehgerät
    - television
    - Tele
    - Fehrnseher
    - Fehrnsehgerät
    - fernsehn
    - fernseher
# config.yml
language: de
pipeline:
  - name: WhitespaceTokenizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: RegexFeaturizer
  - name: DIETClassifier
    epochs: 20
    constrain_similarities: true
  - name: EntitySynonymMapper
  # ...

@r3c0nf1gur3d Can you please upload or update in above post the training examples for the same?

@r3c0nf1gur3d Do check this blog for your reference : RASA - Synonyms - DEV Community

@r3c0nf1gur3d Please check this youtube video for your ref: HOW TO USE SYNONYMS WITH RASA X | INNOVATE YOURSELF - YouTube This will be same for rasa open source also, please see the missing puzzle only :slight_smile:

@nik202 thanks for the response. I updated the initial question.

In regards to the training, I reproduced the issue by using the default rasa project and add a couple of sample sentences. I added the samples under the inform intent.

For me it doesnt change the result, so I am a bit confused. Those the whitespace in ‘Live TV’ split the synonym somehow?

Thanks for your feedback :upside_down_face:

@r3c0nf1gur3d Have you seen the video?

Ehm, yes I have skimmed through it. I don’t see how it helps me. Do you understand my issue or question?

@nik202 Maybe I am also explaining it in a confusing way. In my example:

A: I like to watch Live-TV → Synonym Live-TV for live is correctly detected

B: I like to watch Live TV → Synonym Live TV for live is not detected

Instead, instead it featurizes both words Live as type and TV as platform. I don’t want that because in this context it is wrong. E.g. it is possible to watch “Live TV” on your browser.

@r3c0nf1gur3d I see this issue second time, that some values are extracted and some are not extracted. Do share me some related files for your use case, I will do a dry run of your code;If I got any time on weekend.

@nik202 Thanks a lot

https://github.com/RWolfing/Rasa-Synonym-Detection includes a sample project with the training data.

run rasa shell nlu
"I like to watch some Live TV"
 "entities": [
    {
      "entity": "type",
      "start": 16,
      "end": 20,
      "confidence_entity": 0.9307569265365601,
      "value": "Live",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "platform",
      "start": 21,
      "end": 23,
      "confidence_entity": 0.5813954472541809,
      "value": "tv",
      "extractor": "DIETClassifier",
      "processors": [
        "EntitySynonymMapper"
      ]
    }
  ]

I would however expect it to be the same as e.g.:

I want to watch some LiveTV
"entities": [
    {
      "entity": "type",
      "start": 16,
      "end": 22,
      "confidence_entity": 0.9643778800964355,
      "value": "live",
      "extractor": "DIETClassifier",
      "processors": [
        "EntitySynonymMapper"
      ]
    }
  ]

@nik202 just a short ping, were you able to reproduce the problem with the Github project?

@r3c0nf1gur3d I am sorry bro, I will try tomorrow in the evening, as I was busy over a weekend and even for today and tomorrow. But, surely I will try your use case as soon as I will get some time. Once again really sorry. I will set the reminder now :slight_smile:

@r3c0nf1gur3d try mention only synonyms name like live only in the training example rather then using Live-TV or Live TV or tv just mention live which is the main synonyms. Please do check this example in GitHub repo. https://github.com/RasaHQ/rasa-demo/blob/main/data/nlu/nlu.yml#L2213 please check how they have mentioned the training examples. I hope this will help you, till I check your code. Good Luck!

So FYI if someone runs into the same issue.

Increasing the training epochs of the DIETClassifier fixed the issue. For dev purposes, I decreased it to 20 epochs to speed up training times. That was not a good idea :slight_smile:.

From my experience, the threshold is around a min of 60 epochs. A lower value decreases the entity recognition significantly.

@r3c0nf1gur3d Thanks a lot for sharing the solution.