Entity Synonyms Confusion

Hi Rasa folks,

I have 3 entities and for each entity, I have a list of defined items. I used the lookup table but I did not get good results in spite of all of the mentioned entities in the user utterances are exist in the tables. Therefore, I want to check the performance of the synonyms features.

In my bot, I have an entity called cities which includes the city name and its synonyms. For examples

  • Value: New York , Synonyms: NYC and newyork
  • Value: Los Angelos , Synonyms: LA and losangelos

My question is: do I need to include all the entity values or their synonyms in the training data? For example: book me a ticket to nyc reserve me a ticket to New York . . . .

Do I need to include the ‘Los Angelos’ in the training set in order to define the entity name? Or is there something missing me?

Looking forward to your replies :slight_smile:

I’m not sure, I understood correctly, what you mean. synonyms are used to override the value of the picked entity after ner, in order to increase performace it is better to include as many possible variations as possible

Thanks @Ghostvv for your reply.

Let me rephrase the problem, I have 2 entities: 1- CityNames 2- Dishes.

Initially, I used the lookup table features since I have a specific list of the defined entities. Unfortunately, the results were not that good :frowning:

Therefore, I switched to use the Synonyms entity. For example, in the CityName entity I have the following:

  • Value: New York , Synonyms: NYC and newyork
  • Value: Los Angelos , Synonyms: LA and losangelos . . .

My question in the training set, do I need to include all the cities for all the values (e.g. nyc and LA …etc. )? If yes, then the list of utterances in the training for the intents that include CityName entity will be greater than those intents without entities. Consequently, the model will be biased? I hope I clarified the issue

Thanks in advance :slight_smile:

There is no definite rule. Different things should be tried

Thanks again @Ghostvv for your reply.

But if I am going to train the model for the synonyms of all values that I have, most probably I will get an overfitted model. Does this sound correct or there is something missing me?

might be, but if you are going to include all values, there is nothing left to overfit to. It is better to try it first, before making hypothetical conclusions

@Ghostvv I trained the model on 13 intents. 9 of them have the cityNames as entity. So, I trained the model on all intents. And, for those intents that had cityNames as entity, I included all the values (with their synonyms) in the training set. For example,

  • “book a ticket to LA” (and I list all the synonyms of LA)
  • “book a ticket to NYC” (and I list all the synonyms of NYC)
  • “book a ticket to FL” (and I list all the synonyms of FL) and so on

When I tested the model, it was too biased because the training set of those intents (that had the CityNames entities) are large since their examples are duplicated except for the entity (in order to include all the cityNames values) as shown in the example above.

So, is there anything missed me?

in json format you can provide examples without intents

I was unaware that I can provide examples without intents. Is there a link showing how to do this? I am unable to find it in Training Data Format

Training Data Format Just do not provide intent key

Many thanks @Ghostvv, I will try it and get back to you.

Hello @varton

I am facing the same exact situation… have you find a solution for listing those entities in the training data?