What happens to an ongoing conversation when the Rasa server receives a REST API call to load a different model? Will all ongoing conversations stop or is there a smooth transition?
Ideally, you would run Rasa in a kubernetes cluster and while one instance is updating the model, conversations would continue to be served by other instances. Otherwise, a single instance will not respond while the model is being updated.
Thanks for the answer. I have several related questions though - it would be great if you can share your thoughts.
Does Kubernetes spawn the second service (pod) automatically because the first pod is unresponsive or do you mean I need to run the second pod all the time?
Will this setup work with an external model storage or does it require respawning pods with the new model a a mounted volume?
Will this setup require a lock store besides a tracker store?
Does Kubernetes spawn the second service (pod) automatically because the first pod is unresponsive or do you mean I need to run the second pod all the time?
There are multiple ways to do this in a kubernetes deployment. I’m most familiar with rolling updates but you could also use load balancing and use a liveness probe.
Will this setup work with an external model storage or does it require respawning pods with the new model a a mounted volume?
You can use external model storage and respawning pods depends on the method you choose to implement this.
Will this setup require a lock store besides a tracker store?
You always need a lock store, even with a single instance.