Synonyms vs Lookup Tables

Hey , I have a few questions regarding lookup tables and synonyms:

  1. Do I have to add all the synonyms in the lookup table ? For example if I have an entity for treatment names and for each treatment I have synonyms, do all of them have to be in lookup table or is it enough if I mention them as synonyms for each treatment?

  2. How are synonyms recognized in training? Is it necessary to provide a training example for each of the synonyms in my intents?

  3. If I have a slot set for treatment name and one of the synonym is extracted, what will the slot value be, the synonym or the treatment name?

config.yml:


pipeline: 
  - name: "WhitespaceTokenizer"
    case_sensitive: false
    intent_tokenization_flag: true
    intent_split_symbol: "+"
  - name: "RegexFeaturizer"
  - name: "CRFEntityExtractor"
  - name: "EntitySynonymMapper"
  - name: "CountVectorsFeaturizer"
  - name: "EmbeddingIntentClassifier"
    
policies:
  - name: "KerasPolicy"
    epochs: 300
    featurizer:
    - name: MaxHistoryTrackerFeaturizer
      max_history: 5
      state_featurizer:
        - name: BinarySingleStateFeaturizer
  - name: "MemoizationPolicy"
    max_history: 5
  - name: "MappingPolicy"
  - name: "FallbackPolicy"
    nlu_threshold: 0.4
    core_threshold: 0.3
    fallback_action_name: "action_default_fallback"
  - name: "FormPolicy"

Thanks in advance :slight_smile: @akelad

1 Like
  1. Use synonyms if your objective is to replace different variants of a particular entity with one normalized version for your convenience. Use lookup tables if your objective is to help the model pick it up better.
  2. While writing training data, you might as well pretend that the synonyms don’t exist because the feature kicks in only after the model is done extracting the information. So yes, it is necessary.
  3. Your wording is ambiguous, so I’ll use an example to illustrate it. If the entity value for treatment_name is picked up as chemo, and chemo is listed as one of the synonyms of chemotherapy, chemotherapy will be the final value of the slot even though the user said chemo.
1 Like

@msamogh Thank you for helping me out :slight_smile:
I still have a question regarding (1), I understand what you are saying but does that mean I have to add the synonyms in the lookup table so they could be picked up better? And I would also have to add a training example for all the synonyms? Wouldn’t that just be a repetition.

–>one more question

I am trying to extract treatment names from the user messages, but there are always spelling inconsistencies in the treatment names. For this, is it better to have a spell check component in the beginning of the pipeline or a lookup table with possible misspellings? Thanks

1 Like

Yes, you’d need to add the “synonyms” to your lookup table if you want them to be picked up. As for misspellings, if there are a small number of them, you can again use a lookup table for that. But if you want a custom component, we don’t provide one out of the box. However, you can build your own using something like autocorrect.

1 Like

@msamogh Thanks for the help.