Zero downtime model replace?

Hey guys, simple question - is it possible to do zero-downtime model replacement through API?

When I start a server like rasa run ---enable-api and then try to load a new model via API PUT /model, Rasa isn’t serving the traffic at the time of replacing - I mean that endpoint /webhooks/callback/webhook doesn’t work & it’s waiting for model replace to finish. Any chances for sort of “atomic swap” of the models?

1 Like

Is the model in the same storage as rasa?

No, it’s being loaded from aws s3

Hi @mbukovy, I suggest these steps

  1. Keep the current server running
  2. Start another server with the new model
  3. Switch the traffic to the new server endpoint
  4. If everything is good, kill the old server endpoint

That’s our current setup. I’m trying to find a way, how to speed up the deployment. Starting a new Rasa instance in ECS is quite slow.

There is no way around it. Rasa is slow to start up, even on beefier servers. The models always downloads additional model dependencies. You could minimize the time, if you manage to cache the downloaded dependency models between the servers, but I haven’t tried that before.

If you use multiple Rasa instances per Bot, the problems get manifold and it would probably make sense to use K8s for that.