I’m really sorry for necro-ing and reviving this topic, but implementing it is really important and I’m still stuck here.
OP
As part of a project at work, I’m building a bot that can answer a predefined set of FAQs. Given the large volume of questions we have, writing training data for them all (including implementation with RasaX) will take a lot of time.
I’ve found that some simple tf-idf vectorization produces really good results for answering FAQs that are similar but have entirely unique answers. Eg.
What is an escrow account? What is an escrow cushion?
Yields a very accurate result in TFIDF (given how it’s designed to focus on unique words, of course) but requires a substantial amount of training data to make Rasa differentiate between the two acceptably.
I’ve read the tutorial on designing custom components, but there doesn’t seem to be a way to really approach this particular problem.
@ActuallyAcey have you tried using the ResponseSelector for this?
As for custom components - which part is unclear? you can use tf-idf vectorization as a featurizer, and hten e.g. the SklearnClassifier. Is that what you’re after?
I mean, honestly I don’t know how exactly to start. HOW do I implement the vectorizer? What would the “train” method on TFIDF, an algorithm that loads and processes data on the spot, even do? And how would I set it to actually provide an intent as an output rather than entities?
I had a look at the ResponseSelector, but it seems targetted towards smalltalk and not really as a full-fledged approach.
what do you mean “not really as a full-fledged approach”? It works very well for Q&A type interactions.
You can pass the train method, that doesn’t have to be implemented. E.g. this custom spell checker component I built as an example a while ago doesn’t use the train method:
from autocorrect import spell
class RasaSpellChecker(Component):
defaults = {}
requires = ["tokens"]
provides = ["tokens"]
name = "rasa_spell_checker"
def __init__(self, component_config=None):
super(RasaSpellChecker, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):
pass
def process(self, message, **kwargs):
entity_list = message.get("entities")
donot_replace = []
if entity_list:
message.set("entities", [])
for e in entity_list:
print(e)
if e["entity"] == "name":
donot_replace.append(e["value"])
tokens = [t.text for t in message.get("tokens")]
correct_tokens = [spell(t) if t not in donot_replace else t for t in tokens]
for i, t in enumerate(message.get("tokens")):
t.text = correct_tokens[i]