Hello friends,
I am creating a chatbot. currently I am dealing with the scenario, user mis-spells the word in a sentence. How could we make NLU to find the right word and process it. For example, If user types “I am looking for restarant near me” accidentally. In this sentence, spelling of restaurant is spelled as restarant" my intention is NLU has to find the right intended word of the user and process it with the model with which bot is trained.
Hii @JiteshGaikwad. sorry to ask what is meant by ample of data? one of my friends said we can also use gensim python library. which one do you think is the recommended and easiest way ??
hey @Teluguntla123 i haven’t worked with gensim library so i can’t tell about that but “ample of data” i mean to say enough training data so that even if you mis-spell the words as in the above case the model will predict the intent with good accuracy, as i was working on restaurant bot, i also tried the above use case and it was able to predict the correct intent, as you can see below
okay @JiteshGaikwad. I looked at your output, it is giving accurate. In your training data you spelled restaurant word correctly right? If so, then I want to know how it is matching restarant with restaurant.
If you don’t mind, can you show me your training data. I ll look at it and learn how to train with ample of data.
hey @JiteshGaikwad. you just trained the model this data and model is predicting for mis-spellings also right?
As you trained intent with many examples that have restaurant word in it. Is that why model is able to predict even if you mis-spell it it. Am I right??
You mean is we have to train the model with large amounts of data, so that it could predict even there is slight spelling mistake. right?
The misspellings are predicted because of n-grams.
I think it is split in tri-grams for tokenisation depending on the pipeline which allows a token to be split into many character level tokens allowing it to predict some mis-spells however you could have some confusion between some words where mis spells are difficult to differentiate
no i dont mean to say that, i mean to say just give enough data as you can see i have around 20 utterances for my RestaurantSearch intent, so you dont need to give large amount of datasets
hi @sovikg10 how can i integrate the n-gram in rasa custom pipeline.i have many idea like integrating other nlp fellows with rasa, i dont have much knowledge in rasa i just started rasa in 1 week a ago due that i unable to work in custom pipeline and custom component.If you give me a example it will be much better.So that i can start working.i searched for tutorial for custom pipeline and custom component but i didnt get any tutorial and in rasa doc little information about custom pipeline and custom component but it is not sufficent for me to work. Thanks for help