Last year, I tried to use Rasa for a project that needed to do open-ended chitchat across a few dozen topics but incorporate a bit of conversational memory. It ended up not working so well and I understand that’s not really what Rasa is for. So I recently started working on a different approach using neural conversation models (initially I’m using GPT-2 fine-tuned with a topic-oriented conversation corpus). Some of the responses are magic, but some are garbage because of the lack of any real conversational memory/entities. I’m thinking of ways to start introducing some notions of abstracting training data to start working with slots instead of entity instances, but it seems like someone has to be working on this already.
If only I could jam together a generative model that doesn’t require so much handcrafting with Rasa that can actually do things that make sense. It’s like trying to get the functionality of Google Duplex with Google Meena.
So the question is whether these will always be separate worlds or are there efforts to try to bring the two together somehow - either incorporating a generative chitchat model into Rasa or adding Rasa-type capabilities into generative models. Trying to build on what’s already being worked on if possible.
hi @mmm3bbb , I like your thinking! We are also exploring this as a research topic but as I argue here , it’ll take a fair bit of ingenuity to merge these two worlds.
If you make any headway I’d love to hear about it. As always, we’ll share things early for feedback from the community.
you can query your generative model in a custom action like we do with response selector, but with additional slot/context info from tracker.
Alternatively, you can create a custom policy that would contain generative model
That’s a great writeup @amn41. I think your ‘end-to-end’ comment kinds of summarizes the issue - it’s kind of like integrating a parrot I guess.
I’ll play with a few ideas when I get a chance - I think there are probably some NLG opportunities around linguistic variations that may make things less robotic - especially when slot filling forms. That should be pretty straightforward.
Also, handling the fuzzier bits around the edges to avoid “Sorry I didn’t understand you” seems doable and incorporate some “OK, let’s get back to business” type training data to keep things from going to far afield. I can see getting a user into that (with low prediction confidence) but what happens after the first generated utterance is a mystery until something’s built.