Hi.
As part of a project at work, I’m building a bot that can answer a predefined set of FAQs. Given the large volume of questions we have, writing training data for them all (including implementation with RasaX) will take a lot of time.
I’ve found that some simple tf-idf vectorization produces really good results for answering FAQs that are similar but have entirely unique answers. Eg.
What is an escrow account?
What is an escrow cushion?
Yields a very accurate result in TFIDF (given how it’s designed to focus on unique words, of course) but requires a substantial amount of training data to make Rasa differentiate between the two acceptably.
I’ve read the tutorial on designing custom components, but there doesn’t seem to be a way to really approach this particular problem.
How should I approach this?
I also think TF-IDF is suitable in this case, Have you solve this problem
I am going to use sklearn library but don’t know how to apply it in sparse_featurize folder
I wasn’t able to find a “clean” solution for this, so what I did was generalize all my FAQ data into one intent (or several, if you have sections). I then used a custom action to route the user’s message to a separate Python module that runs a normal TF-IDF search and responds with the results.
Are your both action in parallel like RASA, I mean the classifier and the action selector in TF-IDF ?
I don’t know how to seperate which featurizer for each:
- Countvector ngram -> DIET classifier
- TF-IDF -> Default Reponse selector of RASA
is it available parallel in RASA
I am going to share the custom tf-idf when i finish it
No, they’re sequential. Its really just one custom action, but it sends the message to a different app entirely to do the TF-IDF processing and which sends back a response.
1 Like