Hello @YYtheFenix. Welcome to our community and thanks for your question!
Rasa is not currently designed to learn from conversations in an online fashion (i.e., learning while interacting with customers). Instead, learning happens offline in a batch setting: you accumulate conversations with customers, use successful dialogs as additional training data, and correct unsuccessful dialogs through manual annotation. You then run rasa train
on the new data, and rasa test
to validate your model prior to redeployment. You can find the details in the Rasa docs and you can check our blog post about conversation driven development which outlines the current best practices for building bots using Rasa.
You are correct that in RL the agent can be set up to learn interactively using only a weak reward signal (successful vs. failed dialogs). This is an idea that we plan to experiment with in the future. At the moment, we rely on supervised learning from successful dialogs.
Let us know if you have further questions!