Reinforcement learning FAQ

Hello,

I am building a FAQ chatbot that answers to single user questions. There is currently no actual NLU in user’s question, as the input message is passed to a custom action, which invokes an ElasticSearch query. If hits found, the user is shown the best one. At the end Rasa asks the user to rate the provided solution. How can I utilize the triple “User Question” - “Article Shown” - “Score” in order to improve future FAQ searches/responses? I am wondering whether this is an ElasticSearch querying/filtering only problem, or something that can progressively build an NLU model.

Thank you