Research for Rasa X - How to select conversations that would benefit the chatbot from thousands of conversation's data available?

(Shyam Swaroop) #1

I present two scenarios followed by a proposal for conducting research.

Scenario 1: Imagine you have thousands of conversations to help your chatbot get intelligent. Labelling each conversation takes 2-3 minutes. You can’t expect someone to go through all thousand conversations and label them. Intuitively, labelling some of the conversations won’t help the chatbot at all. Also, there are important conversations that you may miss if you randomly select from pool of thousands conversation. This calls for a better approach to deal with it.

Scenario 2: Imagine your chatbot is at early stage. You are feeding it with examples of conversations. Each time the bot gets trained, you get some metric (accuracy maybe) on intent classification. But, now you are clueless how to improve this accuracy. For starter, you may analyse bot’s performance for different intents and work upon bad performing intents. But this analysis takes time. Plus the problem may be at a more cellular level rather than at intent level.

You got the gist. We need algorithms to better guide us to work efficiently. How do we do solve this?

Proposal: How about we explore techniques like Bayesian optimisation for this? But what do we maximise? I would say, lets maximise the loss function or maybe inverse of performance metric (accuracy maybe). I guess it would be the first time someone would maximise a loss function rather than minimising it. I repeat lets maximise a loss function and sample the examples that can maximise it automatically.