Hello guys, I already found a lot of questions to this topic but non of them provided a good strategy or best practise for my specific problem, so I made a new post, maybe some of you already faced a similar problem and found a good solution. I have an intent greet, which contains a lot of german greetings like “Hallo” or a multilanguage greet like “Hi”. All of them are working fine but if a user greets for example with hiii, the bot can´t classify the input to the intent greet.
So I have two ideas to get rid of this problem:
First, I add some missspelled greets in my trainingsexamples, for example hii, helloo or hellloo. The bot now classifies these kind of inputs correctly.
Second, I implement a string distance-function or a spelling-checker to correct the wrong user inputs into correct ones which can be interpreted by the bot.
We have a lot of local greets here in Germany which should be also known by the bot to make it more userfriendly. Option 1 would be very time-consuming and inefficient to add all possible typos in greetings.
By the way, I am using the Spacy german medium language model to train my chatbot. If I have a short look in the vocab folder within the de_core_news_md folder I see that there are several greetings included, also some of them are misspelled. I still have problems to understand how the model works. So both the words “Hello” and “Helllo” are known for the model, it does not know they are the same.
As a conclusion, I don´t know the best way to sort this problem out. The only option in my opinion is to put a string-distance function for example with the python lib fuzzywuzzy behind the user input and the chatbot to sort most of the typos out.
Any other/better ideas how to handle this simple problem the most efficient way? Thanks in advance!