I was just looking for a solution in these forums to handle typos on Rasa, and found an old thread containing a very interesting Repo called SymSpell:
This file has been truncated.
Spelling correction & Fuzzy search: **1 million times faster** through Symmetric Delete spelling correction algorithm
The Symmetric Delete spelling correction algorithm reduces the complexity of edit candidate generation and dictionary lookup for a given Damerau-Levenshtein distance. It is six orders of magnitude faster ([than the standard approach with deletes + transposes + replaces + inserts](http://norvig.com/spell-correct.html)) and language independent.
Opposite to other algorithms only deletes are required, no transposes + replaces + inserts.
Transposes + replaces + inserts of the input term are transformed into deletes of the dictionary term.
Replaces and inserts are expensive and language dependent: e.g. Chinese has 70,000 Unicode Han characters!
The speed comes from the inexpensive **delete-only edit candidate generation** and the **pre-calculation**.<br>
An average 5 letter word has about **3 million possible spelling errors** within a maximum edit distance of 3,<br>
but SymSpell needs to generate **only 25 deletes** to cover them all, both at pre-calculation and at lookup time. Magic!
I am very intrigued by this repo and am wondering if anyone here have experience with implementing this with Rasa. I cannot see any documentation on the Repo itself, and cannot find any examples on Google.
As always, thank you for helping me out!!
SymSpell indeed looks like a very interesting solution. I have not tried it myself, but am very interested to do so. Have you already worked more on trying it out?
I don’t really know how to implement it or even try it out, since there are no documentation to how to make it work with Rasa.
Hi Eli, Arjaan
I implemented symspell’s python implementation in our chatbot as part of our pipeline… its amazingly fast and also very useful for correcting domain level words i.e. which are not part of standard english language.
so my pipeline has a spell-correcting service which is called every time before passing the utterance to agent…
hope it helps
@gaurvipul could you maybe share the code of symspell’s implementation for rasa