Hi, With RASA FAQ (example intents–>## intent: faq/ask_channels) (Tutorial: Building Assistants) what is the max QnA pair we can onboard. I understand that will be about how many max classes/labels could an NLP classification model handle. Just checking if anything formally communicated around RASA FAQ. Use case is talking about 100k+ QnA pairs. Will RASA standalone could help or Elastic would also be required.
“How much can it handle” is something that has two interpretations.
Rasa does not impose an upper limit, so in that sense … technically, yes it can handle it. It probably won’t solve your problem though.
Are these FAQ questions similar to eachother? What sort of answers are we retreiving? Does the user specify a query like you might in google? It may just be more pragmatic to look at this as a retreival problem that you outsource to elasticsearch. You can still use Rasa but you might use a custom action for this particular use-case.
If you give more information I might give a more indepth answer for your situation. I do agree that 100K responses is a worrying amount to just expect the responseselector to handle perfectly.
Thanks for responding.
Question answers come from different Learning courses. Ex: .C# NET is one such course and there are predined Q-As from the contents of this course( like what are supported data types in .NET). Similarly there are other courses from varied categories.
Expectation is, Bot should be able to retrieve answer from these single long varied list.
If we straightaway use RASA faq’s way then NLP has to be trained with intent for each question ( faq/q1) and a single custom action could pick answer based on the key from the source. Here, as you said RASA as a runner may not be putting constraint but intent classification model configured in pipeline may confuse when 100k classes exist( theoritically).
If yes, what could be approx maximum intents/classes we should be good in above ‘courses’ context. Or Should we ignore RASA and move to Elastic or may be something hybrid using Dialog, NLP and Elastic which definitely would be little complex
I think the bottleneck is not per the number of FAQs per se, rather the similarit of each FAQ. If there are 50 FAQs that are very different and easy to separate then it’s no problem. If there are 5 FAQs that are incredibly similar then you’ve already gotten a problem.
What you’re describing though, 100K examples, is going to be tricky because it is likely that some of the questions/answers are bound to overlap.
One thing, the syntax you describe
#faq/q1 is the syntax for the response selector. This is a good habbit but I think we don’t support custom actions for these. There’s a technical reason why which is explained more in detail in this video. The short story: we also encode the response text in order to fetch the best answer. In order to do that we need to know the text upfront.