Need help in data extraction

Hello friends, I am creating a chatbot. currently I am dealing with the scenario, user mis-spells the word in a sentence. How could we make NLU to find the right word and process it. For example, If user types “I am looking for restarant near me” accidentally. In this sentence, spelling of restaurant is spelled as restarant" my intention is NLU has to find the right intended word of the user and process it with the model with which bot is trained.

Is it possible to do like this?

ya you can do so you neee to provide ample of data

Hii @JiteshGaikwad. sorry to ask what is meant by ample of data? one of my friends said we can also use gensim python library. which one do you think is the recommended and easiest way ??

hey @Teluguntla123 i haven’t worked with gensim library so i can’t tell about that but “ample of data” i mean to say enough training data so that even if you mis-spell the words as in the above case the model will predict the intent with good accuracy, as i was working on restaurant bot, i also tried the above use case and it was able to predict the correct intent, as you can see below


okay @JiteshGaikwad. I looked at your output, it is giving accurate. In your training data you spelled restaurant word correctly right? If so, then I want to know how it is matching restarant with restaurant.

If you don’t mind, can you show me your training data. I ll look at it and learn how to train with ample of data.

Thank you

hey @Teluguntla123 due to some reasons i cant share the file but i can share the Intent data :


  • show me hotels

  • show me hotels near me

  • I am looking for a restaurant.

  • can you show me the restaurant near me

  • can you show me the restaurant in my location

  • show me restaurants nearby

  • show me hotels in my location

  • i am feeling hungry show me some hotels near me

  • i am feeling hungry show me some restaurant near me

  • i am feeling hungry show me some restaurant nearby

  • i am feeling hungry show me some hotels nearby

  • i am feeling hungry

  • show me hotels near me

  • show me hotels nearby

hey @JiteshGaikwad. you just trained the model this data and model is predicting for mis-spellings also right? As you trained intent with many examples that have restaurant word in it. Is that why model is able to predict even if you mis-spell it it. Am I right??

You mean is we have to train the model with large amounts of data, so that it could predict even there is slight spelling mistake. right?

The misspellings are predicted because of n-grams.

I think it is split in tri-grams for tokenisation depending on the pipeline which allows a token to be split into many character level tokens allowing it to predict some mis-spells however you could have some confusion between some words where mis spells are difficult to differentiate

ya i agree with you @souvikg10

no i dont mean to say that, i mean to say just give enough data as you can see i have around 20 utterances for my RestaurantSearch intent, so you dont need to give large amount of datasets

okay got it @JiteshGaikwad. Thank you so much.

Hello @souvikg10. May I know how could you say that it is split in tri-grams ? and what is n-grams, does n depend on pipeline we use?

hi @sovikg10 how can i integrate the n-gram in rasa custom pipeline.i have many idea like integrating other nlp fellows with rasa, i dont have much knowledge in rasa i just started rasa in 1 week a ago due that i unable to work in custom pipeline and custom component.If you give me a example it will be much better.So that i can start working.i searched for tutorial for custom pipeline and custom component but i didnt get any tutorial and in rasa doc little information about custom pipeline and custom component but it is not sufficent for me to work. Thanks for help