How to pull periodically models from cloud

According to the official documentation you can read here, with the current version of rasa we can:

  • monitor the local model folder to check for new models, according to this section
  • monitor a remote server, pulling new models every N seconds, according to this section
  • load from cloud but ONLY ONE explicit model at a time to be used in the “run” command, according to this section.

It sounds to me that I can not have a continuous sync with an AWS S3 bucket for example that is periodically queried to look for a new model uploaded separately.

Is this summary correct? And in case, is there a way to accomplish this just the same (using external services or custom Persistors)?

I think you can call this endpoint to give the server a new model to load.

Tbh, I wrote my own server that connects to any granted S3 bucket (according to a default constant and an environment variable) that can be used as an external http models server. Hope this can help: app.py

import os

from flask import Flask, send_from_directory, request, Response
from os.path import dirname, basename, sep
import boto3

BUCKET_NAME = "my_bucket_name"

app = Flask(__name__)


def get_latest_added_obj(bucket_name):
    s3 = boto3.client('s3')
    objs = s3.list_objects_v2(Bucket=bucket_name)['Contents']
    if len(objs) > 0:
        sorted_objs = sorted(objs, key=lambda obj: int(obj['LastModified'].timestamp()))
        last_added = [(obj['Key'], obj['LastModified'], obj['ETag']) for obj in sorted_objs][0]
        return last_added


def download_obj(bucket_name, filename, local_dir):
    s3 = boto3.resource('s3')
    out_complete_path = sep.join([local_dir, filename])
    if not os.path.exists(out_complete_path):
        app.logger.info(f"downloading {filename} from {bucket_name}")
        s3.Bucket(bucket_name).download_file(filename, sep.join([local_dir, filename]))
    else:
        app.logger.info(f"serving {filename} from local")


def download_file(path):
    return send_from_directory(dirname(path), basename(path), as_attachment=True, attachment_filename=basename(path))


@app.route('/models/default', methods=['GET'])
def serve():
    bucket_name = os.environ.get("BUCKET_NAME") or BUCKET_NAME
    local_dir = "tmp_models"
    name, last_modify_date, etag = get_latest_added_obj(bucket_name)
    param_etag = request.headers.get('If-None-Match')

    if param_etag != etag or param_etag is None:
        app.logger.info(f"new model detected...")
        download_obj(bucket_name, name, local_dir)
        response = send_from_directory(local_dir, name, as_attachment=True, download_name=name)
        response.headers['ETag'] = etag
        return response
    else:
        app.logger.info(f"same model, returning NOT-MODIFIED")
        return Response(status=304)  # NOT-MODIFIED


if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Then, you just have to point to this endpoint through the endpoint.yml file, like this:

models:
  url: "http://models_server:5000/models/default"
  wait_time_between_pulls:  300   # [optional](default: 100)

(“models_server” is the service name I assigned to this server in the docker-compose file)

Final note: since I use a folder named “tmp_model” remember to create it at the same level of the app.py file with all the python flask app