Need help in data extraction

Teluguntla123 · December 14, 2018, 1:50am

Hello friends, I am creating a chatbot. currently I am dealing with the scenario, user mis-spells the word in a sentence. How could we make NLU to find the right word and process it. For example, If user types “I am looking for restarant near me” accidentally. In this sentence, spelling of restaurant is spelled as restarant" my intention is NLU has to find the right intended word of the user and process it with the model with which bot is trained.

Is it possible to do like this?

JiteshGaikwad · December 14, 2018, 4:01am

ya you can do so you neee to provide ample of data

Teluguntla123 · December 14, 2018, 7:29am

Hii @JiteshGaikwad. sorry to ask what is meant by ample of data? one of my friends said we can also use gensim python library. which one do you think is the recommended and easiest way ??

JiteshGaikwad · December 14, 2018, 7:40am

hey @Teluguntla123 i haven’t worked with gensim library so i can’t tell about that but “ample of data” i mean to say enough training data so that even if you mis-spell the words as in the above case the model will predict the intent with good accuracy, as i was working on restaurant bot, i also tried the above use case and it was able to predict the correct intent, as you can see below

========================================================

Teluguntla123 · December 14, 2018, 7:58am

okay @JiteshGaikwad. I looked at your output, it is giving accurate. In your training data you spelled restaurant word correctly right? If so, then I want to know how it is matching restarant with restaurant.

If you don’t mind, can you show me your training data. I ll look at it and learn how to train with ample of data.

Thank you

JiteshGaikwad · December 14, 2018, 8:08am

hey @Teluguntla123 due to some reasons i cant share the file but i can share the Intent data :

intent:RestaurantSearch

show me hotels
show me hotels near me
I am looking for a restaurant.
can you show me the restaurant near me
can you show me the restaurant in my location
show me restaurants nearby
show me hotels in my location
i am feeling hungry show me some hotels near me
i am feeling hungry show me some restaurant near me
i am feeling hungry show me some restaurant nearby
i am feeling hungry show me some hotels nearby
i am feeling hungry
show me hotels near me
show me hotels nearby

Teluguntla123 · December 14, 2018, 8:25am

hey @JiteshGaikwad. you just trained the model this data and model is predicting for mis-spellings also right? As you trained intent with many examples that have restaurant word in it. Is that why model is able to predict even if you mis-spell it it. Am I right??

You mean is we have to train the model with large amounts of data, so that it could predict even there is slight spelling mistake. right?

souvikg10 · December 14, 2018, 8:43am

The misspellings are predicted because of n-grams.

I think it is split in tri-grams for tokenisation depending on the pipeline which allows a token to be split into many character level tokens allowing it to predict some mis-spells however you could have some confusion between some words where mis spells are difficult to differentiate

JiteshGaikwad · December 14, 2018, 8:44am

ya i agree with you @souvikg10

JiteshGaikwad · December 14, 2018, 8:47am

no i dont mean to say that, i mean to say just give enough data as you can see i have around 20 utterances for my RestaurantSearch intent, so you dont need to give large amount of datasets

Teluguntla123 · December 14, 2018, 9:02am

okay got it @JiteshGaikwad. Thank you so much.

Teluguntla123 · December 14, 2018, 9:05am

Hello @souvikg10. May I know how could you say that it is split in tri-grams ? and what is n-grams, does n depend on pipeline we use?

vigneshgig · January 8, 2019, 6:24am

hi @sovikg10 how can i integrate the n-gram in rasa custom pipeline.i have many idea like integrating other nlp fellows with rasa, i dont have much knowledge in rasa i just started rasa in 1 week a ago due that i unable to work in custom pipeline and custom component.If you give me a example it will be much better.So that i can start working.i searched for tutorial for custom pipeline and custom component but i didnt get any tutorial and in rasa doc little information about custom pipeline and custom component but it is not sufficent for me to work. Thanks for help

Topic		Replies	Views
Help needed in creating NLU Data Rasa Open Source	0	400	April 26, 2020
Training data for chatbots with NLU Rasa Open Source	2	309	October 10, 2022
Handling Spelling Mistakes in NLU Rasa Open Source	13	4766	August 10, 2021
Ideas Rasa Open Source	38	5845	February 11, 2024
How to I account for wrong spellings in chat bot Rasa Open Source	6	4377	August 19, 2020

Need help in data extraction

intent:RestaurantSearch

Related topics