I have been investigating the same problem. The way I see the problem is that, as designed, Rasa’s entity extraction is non-contextual. It tries to recognize entities, and then it classifies an utterance to an intent.
The drawback is that EVERYTHING about human language is contextual. Think about all those times you’ve been confused about something someone said in a conversation until they provided the context.
“Oh, you were referring back to that movie we watched last week. Now I understand.”
Another user in a different forum on this site has done a POC of using micro-services. (Providing conversation context to the NLU using microservices) Conceptually, the system would be set up to first classify the utterance to an intent, and then forward it to a Rasa server instance that was trained on data only for that intent. Essentially, the system would be putting the utterance in context before analyzing it.
The micro-service idea looks achievable, but a maintenance nightmare. I’d like to find an easier way to achieve the same goal.