I am wondering how we could get the most accurate image about the performance of a bot after implementation. • Based on test data we can get insights into the accuracy of our chatbot, although I assume overfitting will occur. Furthermore, we are trying to quantify a ‘subjective’ measure in a certain sense so I argue how ‘useful’ this quantification will be in the end? • What we should definitely capture, probably in a pilot, is user experience information related to the conversation. Based on ratings you could investigate whether task completion occurred (even if only at the end of the story after some initial deviations) and whether the user was satisfied with the conversation. This can happen with a star rating and in case of a low star rating an open remark of the user can be left as feedback. • Lastly and most importantly, I was wondering whether we could use the RASA structure to our advantage to improve the feedbackloop in our chatbot. By, for example, creating a new intent (positive/ negative feedback) after asking whether the user was feeling helped out by the chatbot. In case of negative feedback, the bot can ask for a clarification (open question/dropdown menu/…) and might be able to correct the mistake made.
Especially for the last option I am wondering what the different possibilities are currently available to use the structure underlying the RASA framework to optimize the self-learning loop. Would love some feedback!