I am working on a feature where a synonym could contain multiple entities. Depending on the context, those entities should be omitted in favor of the synonym mapping.
One example:
My goal is to identify the streaming type and the platform of the user.
A: I can watch [Live TV](type) // Does not work, Split in [Live](type) and [TV](platform)
B: I am watching football [live](type) on my [tv](platform) // Does work
{
"text": "I like to watch Live TV", // 'Live TV' synonym should be matched
"intent": {
"id": 3847976001285428029,
"name": "inform",
"confidence": 0.9777395129203796
},
"entities": [
{
"entity": "type",
"start": 16,
"end": 20,
"confidence_entity": 0.9307569265365601,
"value": "Live",
"extractor": "DIETClassifier"
},
{
"entity": "platform",
"start": 21,
"end": 23,
"confidence_entity": 0.5813954472541809,
"value": "tv",
"extractor": "DIETClassifier",
"processors": [
"EntitySynonymMapper"
]
}
}
As we see the EntitySynonymMapper triggers for “tv” and identifies “Live” as an entity. I want the EntitySynonymMapper to recognize “Live TV” as a synonym for video_content_type “live” and not set the platform entity at all. Is this possible?
Other files:
# ... domain.yml
entities:
- type
- platform
# ... nlu.yml
- intent: inform
examples: |
- I can not play [Live-TV](type)
- I am watching football [live](type) on my [tv](platform)
- I like to watch [Live TV](type)
- I love to watch tennis on my [tv](platform)
- The [live](type) feature is the best
- [LiveTv](type) is not working on my [tv](platform)
- synonym: live
examples: |
- Livestream
- Live TV
- Livetv
- LiveTV
- Live-TV
- Lievestream
- Live-Streaming
- Livestreams
- synonym: tv
examples: |
- TV
- Fernseher
- Fernsehgerät
- television
- Tele
- Fehrnseher
- Fehrnsehgerät
- fernsehn
- fernseher
@nik202 thanks for the response. I updated the initial question.
In regards to the training, I reproduced the issue by using the default rasa project and add a couple of sample sentences. I added the samples under the inform intent.
For me it doesnt change the result, so I am a bit confused. Those the whitespace in ‘Live TV’ split the synonym somehow?
@nik202 Maybe I am also explaining it in a confusing way. In my example:
A: I like to watch Live-TV → Synonym Live-TV for live is correctly detected
B: I like to watch Live TV → Synonym Live TV for live is not detected
Instead, instead it featurizes both words Live as type and TV as platform. I don’t want that because in this context it is wrong. E.g. it is possible to watch “Live TV” on your browser.
@r3c0nf1gur3d I see this issue second time, that some values are extracted and some are not extracted. Do share me some related files for your use case, I will do a dry run of your code;If I got any time on weekend.
@r3c0nf1gur3d I am sorry bro, I will try tomorrow in the evening, as I was busy over a weekend and even for today and tomorrow. But, surely I will try your use case as soon as I will get some time. Once again really sorry. I will set the reminder now
@r3c0nf1gur3d try mention only synonyms name like live only in the training example rather then using Live-TV or Live TV or tv just mention live which is the main synonyms. Please do check this example in GitHub repo. https://github.com/RasaHQ/rasa-demo/blob/main/data/nlu/nlu.yml#L2213 please check how they have mentioned the training examples. I hope this will help you, till I check your code. Good Luck!
Increasing the training epochs of the DIETClassifier fixed the issue. For dev purposes, I decreased it to 20 epochs to speed up training times. That was not a good idea .
From my experience, the threshold is around a min of 60 epochs. A lower value decreases the entity recognition significantly.