Typo handling best practise

lindig · August 6, 2019, 1:09pm

Hello guys, I already found a lot of questions to this topic but non of them provided a good strategy or best practise for my specific problem, so I made a new post, maybe some of you already faced a similar problem and found a good solution. I have an intent greet, which contains a lot of german greetings like “Hallo” or a multilanguage greet like “Hi”. All of them are working fine but if a user greets for example with hiii, the bot can´t classify the input to the intent greet.

So I have two ideas to get rid of this problem:

First, I add some missspelled greets in my trainingsexamples, for example hii, helloo or hellloo. The bot now classifies these kind of inputs correctly.

Second, I implement a string distance-function or a spelling-checker to correct the wrong user inputs into correct ones which can be interpreted by the bot.

We have a lot of local greets here in Germany which should be also known by the bot to make it more userfriendly. Option 1 would be very time-consuming and inefficient to add all possible typos in greetings.

By the way, I am using the Spacy german medium language model to train my chatbot. If I have a short look in the vocab folder within the de_core_news_md folder I see that there are several greetings included, also some of them are misspelled. I still have problems to understand how the model works. So both the words “Hello” and “Helllo” are known for the model, it does not know they are the same.

As a conclusion, I don´t know the best way to sort this problem out. The only option in my opinion is to put a string-distance function for example with the python lib fuzzywuzzy behind the user input and the chatbot to sort most of the typos out.

Any other/better ideas how to handle this simple problem the most efficient way? Thanks in advance!

tyd · August 7, 2019, 9:17am

Hi @lindig! Have you thought about building a custom NLU component (docs and blog post)?

Perhaps this thread on spell checking might provide some alternative ideas

lindig · August 12, 2019, 6:20am

Hi @tyd! Yes I think this is maybe the best idea. I will have a look on it. Thank you for the links

Topic		Replies	Views
Handle typo errors in Chatbot Rasa Open Source	3	399	January 13, 2023
Ideas Rasa Open Source	38	5895	February 11, 2024
Need help in data extraction Rasa Open Source	12	636	January 8, 2019
Handling Spelling Mistakes in NLU Rasa Open Source	13	4789	August 10, 2021
How to I account for wrong spellings in chat bot Rasa Open Source	6	4388	August 19, 2020

Typo handling best practise

Related topics