I’m thinking of storing the RASA model on our own servers and fetching it for deployment from there. In the description for model storage there is the option to specify when the model should be pulled: Model Storage What I do not understand is why would one want to pull it every 10 seconds instead of pulling it once as it does not change during that time right? I would like to understand why someone would consider this and it would be great to get some insights into it. Maybe @stephens has some ideas?
Good question. I have no official knowledge from Rasa Corp, but I assume it should be efficient about pulling only when there have been updates. Checking often would ensure it grabs a model as soon as it is updated. If it’s not careful, this could be a waste of (network) resources.
Thanks @tomp, I agree. And if not often retrained you would probably also set pull to “null”?
10 seconds seems aggressive to me. The source code shows that the model is downloaded at each interval and it checks the fingerprint to see if it is newer than the current model.
There are different approaches to update the model on a production system and this could be one (particularly for Kubernetes where you have any number of production pods). I prefer k8s rolling updates triggered by a CD pipeline. Could also use the http API to load a new model but you then have to connect to each production instance.