Handling Spelling Mistakes in NLU

Hi there,

How can we handle spelling mistakes in general for better intent classification? I am using intent classifier in tensorflow pipeline and it is not able to generalize well for inputs with spelling mistakes even close to the ones in training data. Can anyone suggest a way to handle this? Thanks!

1 Like

You can a spell checker in your pipeline if you create a custom component (Custom Components) to correct the mistakes before the classification.

1 Like

Another way might be to write a script that takes your nlu file to create additional examples based on your original ones, but including spelling errors.

2 Likes

Having an NLU trained on all possible typos is going to be annoying if not unsustainable; Are you really going to think of all the mis-spellings, cultural memes, and dialect variations to even create the NLU training data? I couldn’t so I had to use a correction service. For my implementation I just use slack so it was simple to put a simple Levenstien distance based spell checker into the python slack channel message handler directly, but I think that a more generalised and sustainable way is to try to use a custom component as @huberrom mentioned.

1 Like