Hey guys, simple question - is it possible to do zero-downtime model replacement through API?
When I start a server like rasa run ---enable-api and then try to load a new model via API PUT /model, Rasa isn’t serving the traffic at the time of replacing - I mean that endpoint /webhooks/callback/webhook doesn’t work & it’s waiting for model replace to finish.
Any chances for sort of “atomic swap” of the models?
There is no way around it. Rasa is slow to start up, even on beefier servers. The models always downloads additional model dependencies. You could minimize the time, if you manage to cache the downloaded dependency models between the servers, but I haven’t tried that before.
If you use multiple Rasa instances per Bot, the problems get manifold and it would probably make sense to use K8s for that.