I’m really sorry for necro-ing and reviving this topic, but implementing it is really important and I’m still stuck here.
As part of a project at work, I’m building a bot that can answer a predefined set of FAQs. Given the large volume of questions we have, writing training data for them all (including implementation with RasaX) will take a lot of time.
I’ve found that some simple tf-idf vectorization produces really good results for answering FAQs that are similar but have entirely unique answers. Eg.
What is an escrow account?
What is an escrow cushion?
Yields a very accurate result in TFIDF (given how it’s designed to focus on unique words, of course) but requires a substantial amount of training data to make Rasa differentiate between the two acceptably.
I’ve read the tutorial on designing custom components, but there doesn’t seem to be a way to really approach this particular problem.
How should I approach this?