EntitySynonymMapper appearently needs word stems instead of words

I am using the default pipeline as advised in the RASA documentation (See “The Short Answer” at Choosing a Pipeline, but could not get EntitySynonymMapper to work properly directly.

I had defined the following synonym for distance

synonym:distance

  • km
  • kms
  • kilometre
  • kilometres
  • mile
  • miles
  • distance

for intent that should recognise questions about distance, such as:

However, when I say “Tell me the kilometres I walked today”, the system (DIETClassifier) would identify “kilomet” as entity “unit” instead of mapping it to “distance” as it is supposed to happen. I understand that the underlying components are using word stems to perform natural language understanding. However, I do not understand why the “DIETClassifier” entity extractor returns the word stem instead of the whole word. Further, I also could not find anywhere in the documentation that we are supposed to use word stems in the synonym mapping definition. I think this is a bug, please correct me if I am wrong. Further, to fix all synonyms in my current version, please let me know which word stemmer the system is exactly using. I noticed that the Lancaster stemmer converts the word “kilometres” to “kilomet”. Is this what is being used?

Thank you

Just to add to this topic, when using the SpaCy tokenizer/featurizer, this issue does not seem to happen. So, I guess this is related to the output of ConveRT tokenizer/featurizer. There is a chance that other components in the pipeline may also use the output of ConveRT incorrectly. Please investigate that further. Furthermore, I am using RASA 1.10.8 in case that matters.

can you revert back your config

I reverted it back to use ConveRT tokenizer/featurizer instead of SpaCy and retrained RASA and again the entity is captured as “kilomet” instead of “kilometres” and is not handled by the EntitySynonymMapper (if I do not explicitly add “kilomet” to the synonym list) . I also tried the American spelling “kilometers” and it produced the same incorrect behaviour.