Lemmatization with Synonyms

rasauser1234 · March 23, 2022, 1:18pm

I am trying to leverage the Spacy lemmatizer to ensure that I can use the lemma of a word when using synonyms. For instance, I have set the below synonyms:

- synonym: submit
  examples: |
    - initiate
    - generate
    - raise

When I input this sample query: “I cannot seem to generate a request” Rasa correctly identifies “generate” as a entity with value “submit”. However, if I write “I am having an issue generating a request” it seems that Rasa actually identifies the entity as “generating” rather than as “submit”. I actually have not even annotated my training data with any examples for “generating” so I am not sure why:

1.) The synonym “submit” is not generated

2.) A new entity value of “generating” is created

Here is my config (I tried to use the Spacy lemmatizer, but it does not seem to make a difference):

pipeline:
- name: SpacyNLP
   model: "en_core_web_md"
   case_sensitive: False
- name: SpacyTokenizer
    use_lemma: True
   intent_tokenization_flag: False
   intent_split_symbol: "_"
- name: RegexEntityExtractor
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
   analyzer: char_wb
   min_ngram: 1
   max_ngram: 4
   use_lemma: True
- name: DIETClassifier
   epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
   epochs: 100

Thank you very much

rasauser1234 · March 28, 2022, 6:01pm

Would my only option be to include all of the different forms of a word in the synonyms? Such as:

- synonym: submit
  examples: |
    - initiate
    - initiating
    - generate
    - generating
    - raise
    - raising

ChrisRahme · March 28, 2022, 6:16pm

To my knowledge, yes (I may be wrong)

You even need to consider typos.

Very unpractical, but you can always build your own custom component.

rasauser1234 · March 29, 2022, 4:45pm

Thanks for the reply. I was laboring under the assumption that the Spacy Tokenizer component derived the lemma from incoming words, which it seems is not the case. I would need to create a custom component in order to invoke Spacy lemmatization explicitly?

ygdo · May 25, 2022, 8:56am

Hello, We have had the issue with my team and we used a custom action instead that will read the slot value and the synonyms and use the difflib package to map back the extracted word to the close synonym value and then to the exact entity value. Try it out, It works very fine for me. You only need to hard code the synonym values in your action.py and maintain it since I think we can’t access NLU using custom actions (we can do for domain thought)

Topic		Replies	Views
Able to lemmatize by modifying spacy_tokenizer, but the output confidence is differing for the same stem word Rasa Open Source	1	931	September 26, 2019
Understand synonym, pipeline component, entities and nlu extraction Rasa Open Source	3	1382	September 11, 2021
Lemmatization & Punctuations Rasa Open Source	9	3283	September 25, 2019
Handing different forms of word in rasa Rasa Open Source	2	1292	August 22, 2019
Synonym not found Rasa Open Source	3	895	August 23, 2019

Lemmatization with Synonyms

Related topics