Partial and conflicting list item utterances

I’m particularly interested about cases where you can specify list items partially but the whole meaning is extracted into the correct list without conflicting with any other entity/synonym. For this example, let’s say the bot returns a vehicle model when you specify the number of wheels and the number of passengers seated.

The user can utter something like -

Could you give me a list of 2 and 3 wheelers that can seat 3?

In this case, the “2 and 3 wheeler” needs to be extracted into a list wheelCount:[2, 3] and the last “3” needs to be extracted as a seatCount:[2]. The annotation into a list for the wheelCount entity works well when I do [2](wheelCount) and [3 wheelers](wheelCount) as in this other thread here. Also for my bot, this is not an extraordinary example, but is fairly common for people to express their interest in this way.

How do I annotate it without running into a lot of synonym conflicts across the two entities?

I assume you mean seatCount:[3] in your example.

What exactly do you mean by synonym conflicts? Can you please give an example?

Hello @Tanja, thank you for replying and sorry for the delay.

That is indeed correct. I meant seatCount:[3] in the OP example.

For the synonym conflict, let’s say the values of seatCount and wheelCount are 2-seater, 3-seater, 4-seater etc. and 2-wheeler, 3-wheeler, 4-wheeler etc. respectively. While defining the other synonyms for these values, I’m struggling to annotate text utterances that can belong to both value sets.

For example, the synonyms for 2-seater would be -

- two-seater
- two seater
- 2 seater
- 2 seat
- 2-seat
- 2

and that for 2-wheeler would be -

- two-wheeler
- two wheeler
- 2 wheeler
- bicycle
- motorbike
- bike
- 2

The 2 appears in both synonym definitions and through this I get a ton of warnings while training the bot. Ultimately, in the sentence in OP, it should capture the 2 in the utterance as wheelCount:[2-seater].

Yes, the 2 can just be in one list. If you define first the synonyms for 2-seater and afterwards the synonyms for 2-wheeler and both contain 2, the 2 for 2-seater would be ignored. 2 would just be a synonym for 2-wheeler in that case.

Did you tried to remove the 2 from the synonym lists and just annotate it in your examples? The NER should be able to distinguish based on the context if the 2 belongs to 2-seater or 2-wheeler.

Thank you @Tanja! I will try this today and report on this thread. I imagine I will need a lot of training examples exploring the entire combinatorial for such examples. How many do you suppose I should include and is there a certain policy I should adopt for the bot for such examples?

It hard to say how many examples you need. The more the better. Just try it out. What policy are you currently using? But in general no special policy is required for the problem you want to solve.

Hello @Tanja, great news! I was able to get the bot to work in this way. No change in pipeline/policy was required.

I was a bit premature to call this a success @Tanja. Looks like if I added more training examples without specifying the synonyms, it started throwing these kind of errors -

Found inconsistent entity synonyms while reading markdown, overwriting 1->1-wheeler with 1->1-seater during merge.

This has also started to negatively impact the NER for such examples. Any suggestions?

mmhh… we add all entities we found in the training data to the list of synonyms. So, if there is a 1 for two different kind of entities, you will get the warning you mentioned. However, if you do not have the EntitySynonymMapper in your pipeline, those synonyms will not be used.

Can you try to remove the EntitySynonymMapper and test again? You will still get the warnings, but you can simple ignore them.

I’ll think about a solution to suppress the warning in such cases.

Much appreciated @Tanja! Are there any drawbacks to removing the EntitySynonymMapper? Is it just that the synonym definitions would need to be manual instead?

If you remove the EntitySynonymMapper no synonyms will be used at all. So, you can remove the synonym sections in your NLU training data files. The CRFEntityExtractor should be able to extract entities pretty will. The EntitySynonymMapper would just help to correct some entities with its mapping. However, as the mapping in your case would always map 1 to 1-seater, it would not help as 1 might also represent a 1-wheeler.

Okay, I’ll try it without the EntitySynonymMapper and report what I find.

Hello @Tanja, so what happened was that it identified the example below in the following way -

Could you give me a list of 2 and 3 wheelers that can seat 3?

wheelCount:[2, 3 wheelers]

What appears to be happening is that the slots are being retained with the exact text that the utterances contain (typos and all) and we lose the structured data that could be used to query the db. An alternative would be that we keep a list of synonyms in our system and map to the right value before querying the db, but this would mean we lose the NLU advantage of Rasa in this case. If we do that, then typos and newer variations would not be identified by our list approach.

Looks like I’ll have to re-add EntitySynonymMapper as a compromise and make sure that the overwrite is more suited to our needs. Are training examples further down the list in given more importance during training?

Are training examples further down the list in given more importance during training?

No, all training examples are treated the same.

In case you are up to writing some custom code, you could also write your own component, see Custom NLU Components. You could use the the current EntitySynonymExtractor as basis (rasa/ at master · RasaHQ/rasa · GitHub) and modify it to your needs. E.g. you could remove conflicting entries from the mapping.

Thanks @Tanja; I’ll give this a shot!