Handling Spelling Mistakes in NLU

Rajskc · February 15, 2019, 9:37am

Hi there,

How can we handle spelling mistakes in general for better intent classification? I am using intent classifier in tensorflow pipeline and it is not able to generalize well for inputs with spelling mistakes even close to the ones in training data. Can anyone suggest a way to handle this? Thanks!

huberrom · February 15, 2019, 10:25am

You can a spell checker in your pipeline if you create a custom component (Custom Components) to correct the mistakes before the classification.

mauricedoepke · February 15, 2019, 1:56pm

Another way might be to write a script that takes your nlu file to create additional examples based on your original ones, but including spelling errors.

mark_collins · February 15, 2019, 2:06pm

Having an NLU trained on all possible typos is going to be annoying if not unsustainable; Are you really going to think of all the mis-spellings, cultural memes, and dialect variations to even create the NLU training data? I couldn’t so I had to use a correction service. For my implementation I just use slack so it was simple to put a simple Levenstien distance based spell checker into the python slack channel message handler directly, but I think that a more generalised and sustainable way is to try to use a custom component as @huberrom mentioned.

Nikhilcheke · October 13, 2020, 4:15am

You can write custom component for this task instead of adding spelling mistakes in training data.

you can refer this article,

Custom component for Spell checking

pradeepbatchu · February 22, 2021, 11:38am

Hi, I have tried as per steps on Custom component for Spell checking but its throwing an error “message.text” with Rasa 2.0. You can get the text using message['text]. But I still couldn’t find a way to set text to new_message. (message[“text”] = new_message doesn’t work). Can anyone please help with this?

daammon · February 25, 2021, 7:36am

Hi Pradeep,

This worked for me overwriting the message: message.set(‘text’, new_message, add_to_output=True)

joseferrerglobant · March 30, 2021, 12:17pm

Hi! You must use message.data['text'] instead of message['text'] in Rasa 2.0

joancipria · April 6, 2021, 11:08am

I’m also following the same medium tutorial as @pradeepbatchu but I can’t get the text message. I’ve tried in the following ways:

using message.get("text") I get 'NoneType' object has no attribute 'split'.

using message.data["text"] I get KeyError: 'text'

using textdata = message["text"] I get TypeError: 'Message' object is not subscriptable

I’ve used them, in the folowing code:

    # textdata = message.text (old way)
    textdata = message.get("text")
    # textdata = message.data["text"] (@joseferrerglobant way)
    
    textdata = textdata.split()
    new_message = ' '.join(spell.correction(w) for w in textdata)
    
    # message.text = new_message (old way)
    message.set('text', new_message, add_to_output=True)

How can I correctly get the message?

koaning · April 6, 2021, 1:25pm

I’m currently doing a bit of research on this topic. My belief is that it’s dangerous to use spell checkers because they will get it wrong too sometimes. Especially when you apply them to short sentences.

Instead I’m trying to figure out if it makes sense to augment the training data beforehand to have spelling errors in it. I’ve done some experiments nlpaug that look promising. It’s essentially what @mauricedoepke suggests but what I’m trying to figure out is if it makes sense to add more training data or if you can instead apply a --finetune trick. Note that all of this is work in progress and should not be interpreted as something that will “always work”, but it’s worth experimenting.

joseferrerglobant · April 6, 2021, 3:04pm

Hi @joancipria , I’m using Rasa 2.3.4 and this message.data["text"] works for me. You should try to see what you have inside “data” dict to check in wich property do you have the message.

You can check the code on Message class:

rasa/message.py at 2.4.x · RasaHQ/rasa (github.com)

CuFFaz · August 10, 2021, 10:46am

Hey, Any updates yet? Forming up the training data to handle such spelling exceptions is too extensive.

koaning · August 10, 2021, 11:08am

It’s in preview mode now, but you might enjoy playing around with taipo.

ermarkar · August 10, 2021, 12:19pm

sorry to ping you there but can you please answer this as well Not able to call action server using rasa-sdk

Topic		Replies	Views
Ideas Rasa Open Source	38	5832	February 11, 2024
Spelling mistake Rasa Open Source	1	821	May 6, 2019
Relaunch NLU pipeline from custom action (Rasa 1.X) Rasa Open Source	7	525	February 11, 2021
How to I account for wrong spellings in chat bot Rasa Open Source	6	4375	August 19, 2020
Preprocess user input Rasa Open Source	3	1301	March 9, 2021

Handling Spelling Mistakes in NLU

Related topics