I have to build a bot for the first phase of it i have more than 500 unique FAQ questions How can i handle such large number of FAQ’s. Using Intents will not be good approach according to my knowledge Can anyone suggest good approach to start with @souvikg10 can you pls help
i think 500 intents should be fine. I am not sure why you think it isn’t a good idea?
you can organize your training data in the data/faq_question*.yml and you can use response selector or a database fetch of the answer.
in case of large faqs which are single turn- it is perhaps easier to rely just on the NLU and use the intent to fetch the answer from a database.
for multi-turn questions, consider a form
Okay i haven’t used it for 500 intents so thats why i asked will it be ok to use treat faq’s using 500 retrieval intents like
faq/< faq-1 > and so on
or what i thought was i will create Text embeddings for 500 questions and then using cosine similarity find the answer.
which one is better approach according to you.
and how should i place my retrieval intents (500 of them) in NLU.yml and their responses in DOmain.yml any good way to handle them @souvikg10 your help will be more than useful to me
i think you can try retrieval intents but i haven’t really worked with it.
for me, we tried the NLU in the past and then mapped the NLU to a database call to fetch answers, i did it for about 100 or so intents so i wouldn’t imagine 500 be any different except the scale meaning managing them could be tricky.
in Rasa, when you train a model you pass of a directory where your training data is kept and thus you can create one nlu.yml per FAQ or even a set of FAQ which are similar like account_opening, payments etc. it wont change how the training is done but at least gets you better organized
Creating text embeddings isn’t any different than what rasa pipelines are doing, I used postgres to do that once, really neat as it was using trigrams to compare similarity between the FAQ question and the user input but would fail miserably for anything more clever. instead if you want to avoid providing a lot of examples per FAQ, then use a pretrained model like spaCy, conVert or transformers
You could also try QnA system but i am not sure whether that works, deepset’s haystack is a nice product