Mapping FAQ with RASA for large dataset (2000+)

khut · November 3, 2018, 10:08am

RASA is consist of RASA NLU + Core, I have tested around I understand some part about it. I try to put it into sample practise, and its working perfect.

I plan to bring it into the next level, I wish to create a FAQ system based on RASA stack with help of “tensorflow” backend.

I got over 1200+ pair of Questions and Answers. 1st, NLU would take role to understand and classify the intent along with entity extraction. 2nd it will pass the json response to RASA core where Answers will map or reponse back to the users. It sounds simple, but as I go and check the RASA it give something different. Normally, RASA core will response to the User back based on pre-define story along with ==> “utter_”. Pre-defined story is good, but for small amount of dataset only. we have to write it manually.

How to deal when dataset or Knowledge based is growing larger such as 1000+ or 5000+, We cannot manually mapping it. I try to look around but cannot find any proper way to deal with it yet.

Previously, I used [Retrieval Model] Sklean Tfidf-vectorizer as bags of word along with consine-similairy to compare and return the most similar question index, when index is found Answer will select based on index, but this kind of solution is not effective since the meaning will lost and much more problem.

Anyone got such a good solution for this ??

Thank you

souvikg10 · November 3, 2018, 10:30am

I don’t think for FAQ, rasa core would help you that much. Unless your FAQ’s are contextual and personalized, it won’t make much sense to use rasa core.

If you catch an intent that links to an answer in a knowledge base, just use if-then-else statements. I would even use elasticsearch and use simple tokenisation pipeline of ELK to find the right answer. It is very powerful

However, if you would like add context to your conversation, for example a conversation about general questions like product pricing can be answered in two ways

Q - How much an AWS Lightsail subscription cost?
A - You provide a full answer regarding all types of cost

Here you could just use the NLU and use retrieval based dialogue system.

However, the same conversation can be handled as

U - How much an AWS lightsail subscription cost?
B-  How much memory would you look for?
U- 500mb
B- How much core would you need?
U- 1 core
B - It will cost around 3,50$ a month

They are both FAQs, once is simple while the other is contextual where Rasa core brings value

Now for a very large dataset - i think you are leaning towards Question Answering system- you provide a corpus to a neural network - it learns the embeddings and allows the token to map or point to the right statements in the text. I am not sure how well it works but this is not you can achieve with Rasa at this moment.

khut · November 5, 2018, 3:38am

Thank you @souvikg10. I think RASA is not support such that kind of thing yet. Maybe I just try using RASA NLU to categorized the type of request 1st and return “Intent” back to FAQ database. LSTM (encoder-decoder) will use at knowledge based in order to find the pair questions and answers

znat · November 5, 2018, 5:54pm

Look at theRasa Addons FAQ example. I thinks it does what you need (let you manage your KB without having to modify the Core model every time)

souvikg10 · November 5, 2018, 5:59pm

Doesn’t this make your use of rasa core kind of moot? maybe i am wrong.

I mean if NLU detects an intent- i might as well write a custom logic to retrieve the information using code instead because i know it is an FAQ type question.

I personally don’t recommend rasa core for FAQ unless it is contextualised and there is a personalisation which is what we are trying to treat at our side.

znat · November 5, 2018, 10:51pm

A project starts with a FAQ and becomes more complicated with contextualized conversations in time, so why not starting with the right stack? Plus FAQ can be asked as side questions in more complex flows, and having all one turn questions grouped makes dealing with those side questions easier.

khut · November 6, 2018, 4:58am

@znat, I got your suggestion, but as I investigate your mentioned RASA add-on FAQ. I start to be puzzled. I have attached one sample image of sheet which consist of Question | Answer | Type. Example:

        Question             |             Answer             |             Type.

1 How to login | Go to login Page | login

2 Reset password | Click on Reset Button | password

I first train Rasa to be able in identify the Type(Intent) of the input, return response as json to CORE.

Next is RASA core role, I already got response as “intent” and “score”. now, I want RASA core to able to map the input Question to correct Answer. Possible case, pair of number of FAQ increase up to 2000+. We cannot manually sit and label everything.

As I experiece, we do question comparison and return answer based on the most similarity. This done, by SKlearn lib previously not in RASA.

I try to find a way that how RASA core could be implement in such above case.

znat · November 6, 2018, 2:33pm

Are you saying that the intent is mapped to a topic and not a particular question? How would you identify the right response then within a topic?

khut · November 7, 2018, 1:31am

@znat, It would something that start like this.

1st, I trained Questions to Intent in NLU so that by given particular question --> intent would response correctly. It will pass to the 2nd stage.

2nd stage, Based on given intent, I got 3 options still under-consideration,

Externally, I would like to use some deep learning algorithm such as LSTM, RNN (encoder/decoder) or Siamese, by training them in form of questions and answers pair, so that given particular question then answer would be given. But until now, I couldn’t find any properly proof about how to really implement it
Externally, It does compare given Question with all Questions in Type or Class, and return index with the most similarity. Index will use to query the Answer. I would get the help from Sklearn, pairwise consine-similarity and using Hot-encode or Word vector presentation (Word2Vec).
Internally, RASA core, as I see potentially. RASA will map the question to answer in form of Story. But we have to write it manually, that limit the capability of large dataset, You have mentioned add-on. I try to check on it but cannot really find a way to figure out my problem yet.

I may not correct in this point, since I’m a bit new to this NLP. But I hope to deal with such a problem there.

znat · November 7, 2018, 3:18am

It is certainly ambitious. How about starting from the simplest implementation possible, see how it goes and make a benchmark, and then trying to improve?

khut · November 7, 2018, 4:59am

@znat, I see

I will looking around and start with sample testing one.

Thank you

JoeTorino · November 7, 2018, 3:38pm

However wouldn’t it make sense to use Rasa NLU since with the right pipeline it would still be a good classifier?

souvikg10 · November 7, 2018, 3:41pm

Depends on your data, we have used elasticsearch to search answers from training manuals since you don’t have labelled data. however if you have labelled data that you can classify and use a dialogue (simple rule based or as @znat mentioned using some addons) and retrieve the answer from your database

I questioned the usability of Rasa core for such cases where context don’t play a role

JoeTorino · November 7, 2018, 3:48pm

When you say labelled data do you mean intents?

Why did you use training manual? Is it because they already have pre-defined questions?

souvikg10 · November 7, 2018, 3:58pm

Yeah training manuals have a lot of answers in it. We just index them in elasticsearch and use typical tokenisation techniques natively present to it. This however isn’t a chatbot but rather FAQ driven question answering system with keyword driven search.

Yeah with labelled data I mean intents and examples for every intent.

Like Password reset - Intent

Examples-

I want to reset my password How to change my password

Smthing like that

khut · November 8, 2018, 3:13am

Yeah, I agree with @souvikg10 about indexing and elastic search for FAQ, but just this kind of concept is not really efficient. Because when we shuffle or changing (synonym) the words for input question. Its not capable of understanding our goal.

souvikg10 · November 8, 2018, 8:15am

It depends on your use case, elk is a great way bootstrap your knowledge base and building a simple faq system using keyword detection, you can even add some more NLP techniques like stemming, tokenisation and stop words removal and use tf-Idf to detect most likely answers for a particular question but in the end it is keyword detection and can only go far, when i talk about training manuals you are looking at a process with discrete amount of words to express what you need. You can’t change the word loan in 1000 different ways.

However more complex your knowledge base is in regards to faq and what you want to achieve with it,

I truly believe building non contextual faq bots aren’t going to help your end users, add context and drive a conversation to engage users. This adds value with a chatbot.

If a user asks - how do I change my password

Your answer cannot be

Here are 10 different steps to do so and start by If you are on windows 10 do this else if you are on mac do that etc etc.

You completely ignored the user’s context

JoeTorino · November 9, 2018, 8:40am

Thanks for the info, I’m trying to develop a deeper understanding of the different ways of using RASA NLU and RASA Core.

From what I understand RASA core is used for retaining memory of previous states (context) when needing to carry out further operations related to previous intents, entities and actions. So if you have a question related to the previous one then the computer will be able to understand what you are saying?

In a service chatbot I would assume this to be important.

souvikg10 · November 9, 2018, 8:59am

Indeed, you can create context about a conversation and use slots to remember what was said but ofcourse used wisely.

Suppose, you want to provide pricing support for your product

-  User: I would like to know about how much it costs to buy Product X
-  Bot:  The cost would be 10$
-  User: It is all inclusive?
-  Bot: Yes, the price is all inclusive
-  User: How long will it take to be delivered
-  Bot: On average, 3-5 business days
-  User: I would like to order one
-  Bot: Sure, I have added 1 Product X  to shopping list

As you see here, the conversation has the context about product X from the beginning. Such type of conversation are really good to have with your user, keep in mind this is also FAQ but personalized and engaging. Rasa core is useful in such situation.

There are many ways to handle such questions - a generic way which most companies sort out with an FAQ page however the flavor should really be generic and I would handle such cases with elastic search like solutions because i don’t want to project it as a chatbot, it sends a wrong message.

a more personalised way to handle such question would allow users to be more engaged with your chatbot and also help them understand what an interaction means with the company.

creating a chatbot and then adding generic q&a in it because they are more frequently asked so it makes sense is never the right way to place your chatbot in the market. FAQs are generally built to easily find simple answers while chatbots are designed to potentially guide the user coming to your store. They are not the same

nghuyong · August 17, 2019, 3:07pm

Hello, we build a faq bot based rasa, it can work well in some simple environments.

Topic		Replies	Views
Best Practices for Integrating Large-Scale Q&A Datasets into Rasa Framework Rasa Open Source	0	60	November 10, 2024
How can I train a bot to answer FAQ which are quiet length? Getting Started with Rasa	5	334	January 7, 2019
Building FAQ chatbot like QnA Maker Rasa Open Source	4	3902	April 28, 2020
Reinforcement learning FAQ Rasa Open Source	0	501	May 13, 2020
Building first chatbot: RASA Core question Rasa Open Source	12	2417	March 23, 2019

Mapping FAQ with RASA for large dataset (2000+)

Related topics