I am wondering how we could get the most accurate image about the performance of a bot after implementation.
• Based on test data we can get insights into the accuracy of our chatbot, although I assume overfitting will occur. Furthermore, we are trying to quantify a ‘subjective’ measure in a certain sense so I argue how ‘useful’ this quantification will be in the end?
• What we should definitely capture, probably in a pilot, is user experience information related to the conversation. Based on ratings you could investigate whether task completion occurred (even if only at the end of the story after some initial deviations) and whether the user was satisfied with the conversation. This can happen with a star rating and in case of a low star rating an open remark of the user can be left as feedback.
• Lastly and most importantly, I was wondering whether we could use the RASA structure to our advantage to improve the feedbackloop in our chatbot. By, for example, creating a new intent (positive/ negative feedback) after asking whether the user was feeling helped out by the chatbot. In case of negative feedback, the bot can ask for a clarification (open question/dropdown menu/…) and might be able to correct the mistake made.
Especially for the last option I am wondering what the different possibilities are currently available to use the structure underlying the RASA framework to optimize the self-learning loop. Would love some feedback!
Hey @BlackSwan. Well, you have some great points about the evaluation already. You should for sure first test the model using the evaluation scripts, also you can vizualise the training stories to see if the layout of the conversations covered make sense.
Once that’s covered and works well, then it’s the matter of evaluating the usefulness of your bot. Checking the conversations and evaluating if the users achieved the goal could be one metric you could use. Feedback from the users is very helpful as well (mid-conversation or at the end of the conversation).
As of feedback look: what we have implemented already is an interactive learning which is specifically built to improve the bot through the feedback provided while talking to a bot.
The interactive learning operation is too complicated, since i get a conversation, writing it down directly is much faster and easier then operating the interactive learning program . If there is no conversation in hand, i always forget which step i already taken.
Thanks for the responses! What I am looking for is really a feedback loop after interactive learning, which is a great tool to train the chatbot initially, but not open for use anymore after implementation. I am really wondering how we could collect and integrate feedback, following implementation, to allow the chatbot to learn from the new conversations and optimize himself continuously.
It is a great discussion and I would love to hear some more creative ideas for solving this issue!
Our use case is collecting a certain amount of information from a user and then transferring the conversation to a human agent, we can define success by the conversation reaching the ActionTransfer. So we could evaluate the conversation as successful/unsuccessful, but I would be interested in how to automate feeding the successful examples back into the dialogue model.
At this point the simple thing to do would be to store all the conversations as stories and then filter out, via a script, the successful call and then have a batch job feeding them into the training data and retraining the model.
But this would have to be done on a per-bot basis, it’s not clear yet how that could be abstracted and generalized.
But the problem you’re thinking about is very relevant and the solution would be very useful.