Problem with synonyms

Hi all,

I’m new to rasa and I’m trying to define a mapping between departments’ code and words (description) which may refer them.

for example:

## synonym:LA014
- Data Management
- DataManagement
- Storage Management

Then I defined an intent and I listed all codes with some training data:

## intent:find_available
- Can you give me the available for [Data Management](code)?
- [LA014](code)
- ... list of all codes ...

Synonym works only if I provide an example in the training data

Since the codes list is quite long and the description length varies greatly how can I extract entities codes without specifying each of them in the training data?

Thanks

Hi Gabriel, have you added EntitySynonymMapper to your pipeline?

Hi @zezutom,

Thanks for your reply.

Yes, of course. This is my pipeline:

pipeline:
  - name: WhitespaceTokenizer
    case_sensitive: False
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

hi @GabrieleRomeo, okay, I’ve played with your example for a bit… I defined the pipeline in the same way you did, except I added “CRFEntityExtractor” in front of the synonym mapper:

- name: CRFEntityExtractor
- name: EntitySynonymMapper

And in my nlu.md I’ve added:

## intent:find_available
- Can you give me the available for [Data Management](code)?
- Data for [LA014](code)
- Can I get input for [LA014](code)?

## synonym:LA014
- Data Management
- DataManagement

Then, both Data Management and DataManagement get correctly interpreted as the code LA014

Hi @zezutom

It works

Thank you :wink:

Cool, glad I could help :slight_smile:

Hi @zezutom

I’ve encountered another strange behaviour with code extraction

If a synonym list contains just one word the code extraction fails and the system returns the actual name (i.e DataManagement and not LA014)

## synonym:LA014
- DataManagement

is there a minimum threshold for synonyms in Rasa?

Hi @GabrieleRomeo, I don’t think there is a minimum threshold for synonyms. Have you tried using them inline?

- Can you give me the available for [DataManagement]{"entity": "code", "value": "LA014")

I know it’s more repetitive, but it’s worth a try.

Hi @zezutom

thanks for your suggestion.

I have another problem now

As I said, I have a long list of department names and descriptions which are use to fill a specific form. For example:

## synonym:N0280
- Disaster Recovery
- DR

## synonym:LA014
- Data Management
- DataManagement

And the intent description is as follow:

## intent:find_available
- Can you give me the available for [Data Management](code)?
- Can you give me the available for [DR](code)?
- Data for [LA014](code)
- Data for [N0280](code)
- Can I get input for [LA014](code)?
- Can I get input for [N0280](code)?

This work well however I need to define another intent which (in some cases) contains entities (synonyms) with the same value as before. For example:

## synonym:1
  - DR

## intent:find_status
- Can you give me the status of the [DR](informative_system) system?
- Can you give me the status of system number [1](informative_system) ?

This fact creates name collisions and entity extraction turns out to be mistaken (informative_system is mapped to N0280 instead of 1)

How can I structure my NLU so that different intents refer to different synonyms?

Thanks

Hi @GabrieleRomeo, Ok, are you sure the two intents (find_available, find_status) are correctly distinguished from each other? If not, I would try to add more examples of find_status.

Another option that comes to my mind is splitting the model into two separate models. One for departments and the other one for informative systems. In that case you’d have two different Rasa agents on the backend, each handling a dedicated model - see my blog post on how to do it.

Whether you can split the model or not also depends on your UI. For example, does the user identify the domain they want to ask about (departments vs informative systems) before they interact with the chatbot?