Behavior when changing Rasa models during a conversation?

What happens to an ongoing conversation when the Rasa server receives a REST API call to load a different model? Will all ongoing conversations stop or is there a smooth transition?

Ideally, you would run Rasa in a kubernetes cluster and while one instance is updating the model, conversations would continue to be served by other instances. Otherwise, a single instance will not respond while the model is being updated.

Thanks for the answer. I have several related questions though - it would be great if you can share your thoughts.

Does Kubernetes spawn the second service (pod) automatically because the first pod is unresponsive or do you mean I need to run the second pod all the time?

Will this setup work with an external model storage or does it require respawning pods with the new model a a mounted volume?

Will this setup require a lock store besides a tracker store?

Does Kubernetes spawn the second service (pod) automatically because the first pod is unresponsive or do you mean I need to run the second pod all the time?

There are multiple ways to do this in a kubernetes deployment. I’m most familiar with rolling updates but you could also use load balancing and use a liveness probe.

Will this setup work with an external model storage or does it require respawning pods with the new model a a mounted volume?

You can use external model storage and respawning pods depends on the method you choose to implement this.

Will this setup require a lock store besides a tracker store?

You always need a lock store, even with a single instance.