Lemmatization with Synonyms

I am trying to leverage the Spacy lemmatizer to ensure that I can use the lemma of a word when using synonyms. For instance, I have set the below synonyms:

- synonym: submit
  examples: |
    - initiate
    - generate
    - raise

When I input this sample query: “I cannot seem to generate a request” Rasa correctly identifies “generate” as a entity with value “submit”. However, if I write “I am having an issue generating a request” it seems that Rasa actually identifies the entity as “generating” rather than as “submit”. I actually have not even annotated my training data with any examples for “generating” so I am not sure why:

1.) The synonym “submit” is not generated

2.) A new entity value of “generating” is created

Here is my config (I tried to use the Spacy lemmatizer, but it does not seem to make a difference):

pipeline:
- name: SpacyNLP
   model: "en_core_web_md"
   case_sensitive: False
- name: SpacyTokenizer
    use_lemma: True
   intent_tokenization_flag: False
   intent_split_symbol: "_"
- name: RegexEntityExtractor
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
   analyzer: char_wb
   min_ngram: 1
   max_ngram: 4
   use_lemma: True
- name: DIETClassifier
   epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
   epochs: 100

Thank you very much

Would my only option be to include all of the different forms of a word in the synonyms? Such as:

- synonym: submit
  examples: |
    - initiate
    - initiating
    - generate
    - generating
    - raise
    - raising

To my knowledge, yes (I may be wrong) :confused:

You even need to consider typos.

Very unpractical, but you can always build your own custom component.

Thanks for the reply. I was laboring under the assumption that the Spacy Tokenizer component derived the lemma from incoming words, which it seems is not the case. I would need to create a custom component in order to invoke Spacy lemmatization explicitly?

1 Like

Hello, We have had the issue with my team and we used a custom action instead that will read the slot value and the synonyms and use the difflib package to map back the extracted word to the close synonym value and then to the exact entity value. Try it out, It works very fine for me. You only need to hard code the synonym values in your action.py and maintain it since I think we can’t access NLU using custom actions (we can do for domain thought)