Entity synonyms in training data

Hey, Like including files as the lookup data, can we also include files for entity synonyms instead of typing them separately?

1 Like

Hi Aysa. I don’t believe we currently support loading synonyms from a file, but the easiest way to get that data in would be via an entity_synonyms array in your json data file. Info here: https://rasa.com/docs/nlu/master/dataformat/#entity-synonyms

I don’t know what format your synonyms file has, but it shouldn’t be too difficult to transform it into this format to paste in your data file.

1 Like

@erohmensing

I am currently doing it this way only. But my data seems to increase beyond what I could handle this way. Thanks anyways! :smile:

@Asya - not sure if this is what you are looking for. You can create .md file in which you can have entries like the one shown below. While training nlu you can given the path to a folder which contains all the nlu data files (.md or .json).

synonym:fr

  • french
  • French
  • french language
  • French Language

@apurva So for multiple entities, shouldn’t I be giving separate files for each one?

"entity_synonyms": [
            {
              "value": "french",
              "synonyms": "data/french.md"
            },
            {
              "value": "english",
              "synonyms": "data/english.md"
            },
      ]

or can I just give a single file with all the entities in it?

@Asya both will work (provided you point to the folder and not the individual files during training). If you have different functional areas with many synonyms then keeping separate files will be easy to manage or if you few synonyms then single will do as well. Example

synonym:en

  • english
  • English
  • english language
  • English Language
  • en_us
  • en_uk

synonym:fr

  • french
  • French
  • fr
  • FR
  • french language
  • French Language
  • fr_FR
  • fr_fr

synonym: somethingelse

  • …
  • …

Also you don’t need to put that in json file just point to the folder in which all md files are located. For example if all your files (md or json) is located at c:/nlu then you can point to it as shown below

python -m rasa_nlu.train -c config/nlu_model_config.yml –data c:/nlu -o models --fixed_model_name my_nlu --project my_current --verbose

@apurva Thanks! will check it out! :+1: