Could you not just use the existing test features built into Rasa to accomplish the same thing? It basically runs through all your stories that you build as tests and if anything fails it writes it to a file to review. I have a post about this here, How to handle failed_stories in Evaluating core models
Would this give you what you want?