Handling Spelling Mistakes in NLU

Hi there,

How can we handle spelling mistakes in general for better intent classification? I am using intent classifier in tensorflow pipeline and it is not able to generalize well for inputs with spelling mistakes even close to the ones in training data. Can anyone suggest a way to handle this? Thanks!

1 Like

You can a spell checker in your pipeline if you create a custom component (Custom Components) to correct the mistakes before the classification.

1 Like

Another way might be to write a script that takes your nlu file to create additional examples based on your original ones, but including spelling errors.


Having an NLU trained on all possible typos is going to be annoying if not unsustainable; Are you really going to think of all the mis-spellings, cultural memes, and dialect variations to even create the NLU training data? I couldn’t so I had to use a correction service. For my implementation I just use slack so it was simple to put a simple Levenstien distance based spell checker into the python slack channel message handler directly, but I think that a more generalised and sustainable way is to try to use a custom component as @huberrom mentioned.

1 Like