Typo handling best practise

Hello guys, I already found a lot of questions to this topic but non of them provided a good strategy or best practise for my specific problem, so I made a new post, maybe some of you already faced a similar problem and found a good solution. I have an intent greet, which contains a lot of german greetings like “Hallo” or a multilanguage greet like “Hi”. All of them are working fine but if a user greets for example with hiii, the bot can´t classify the input to the intent greet.

So I have two ideas to get rid of this problem:

First, I add some missspelled greets in my trainingsexamples, for example hii, helloo or hellloo. The bot now classifies these kind of inputs correctly.

Second, I implement a string distance-function or a spelling-checker to correct the wrong user inputs into correct ones which can be interpreted by the bot.

We have a lot of local greets here in Germany which should be also known by the bot to make it more userfriendly. Option 1 would be very time-consuming and inefficient to add all possible typos in greetings.

By the way, I am using the Spacy german medium language model to train my chatbot. If I have a short look in the vocab folder within the de_core_news_md folder I see that there are several greetings included, also some of them are misspelled. I still have problems to understand how the model works. So both the words “Hello” and “Helllo” are known for the model, it does not know they are the same.

As a conclusion, I don´t know the best way to sort this problem out. The only option in my opinion is to put a string-distance function for example with the python lib fuzzywuzzy behind the user input and the chatbot to sort most of the typos out.

Any other/better ideas how to handle this simple problem the most efficient way? Thanks in advance!

Hi @lindig! Have you thought about building a custom NLU component (docs and blog post)?

Perhaps this thread on spell checking might provide some alternative ideas

2 Likes

Hi @tyd! Yes I think this is maybe the best idea. I will have a look on it. Thank you for the links :slight_smile:

1 Like