Implementing TFIDF as a custom component?

Hi.

As part of a project at work, I’m building a bot that can answer a predefined set of FAQs. Given the large volume of questions we have, writing training data for them all (including implementation with RasaX) will take a lot of time.

I’ve found that some simple tf-idf vectorization produces really good results for answering FAQs that are similar but have entirely unique answers. Eg.

What is an escrow account?
What is an escrow cushion?

Yields a very accurate result in TFIDF (given how it’s designed to focus on unique words, of course) but requires a substantial amount of training data to make Rasa differentiate between the two acceptably.

I’ve read the tutorial on designing custom components, but there doesn’t seem to be a way to really approach this particular problem.

How should I approach this?

Anything on this, guys?

I also think TF-IDF is suitable in this case, Have you solve this problem I am going to use sklearn library but don’t know how to apply it in sparse_featurize folder

I wasn’t able to find a “clean” solution for this, so what I did was generalize all my FAQ data into one intent (or several, if you have sections). I then used a custom action to route the user’s message to a separate Python module that runs a normal TF-IDF search and responds with the results.

Are your both action in parallel like RASA, I mean the classifier and the action selector in TF-IDF ?

I don’t know how to seperate which featurizer for each:

  • Countvector ngram -> DIET classifier
  • TF-IDF -> Default Reponse selector of RASA is it available parallel in RASA

I am going to share the custom tf-idf when i finish it

No, they’re sequential. Its really just one custom action, but it sends the message to a different app entirely to do the TF-IDF processing and which sends back a response.

1 Like