Interpreter object takes 2 minutes to load and loads everytime azure function is called

I really need some help on this. The command

interpreter = Interpreter.load(model_directory1)

is taking too much time (2 minutes) to load model (it builds model from scratch , downloading bert model vocab and other configurations). It happens every time azure function is called. It takes nearly 2 minutes. Is there a way by which I can save this interpreter object and just consume it in a Azure Function.

Following are the files, code and error that I am using. I am using a pre-trained model and named it’s directory as “nlu_new”. I am able to get predictions

requirements.txt ------------------------------------------------------------------------>

rasa[transformers] ------------------------------------------------------------------------------------------->

import logging
import azure.functions as func
from rasa.nlu.model import Interpreter
import json

def main(req: func.HttpRequest) -> func.HttpResponse:'Python HTTP trigger function processed a request.')
    message = req.get_json()
    interpreter = Interpreter.load(model_directory1) ## this should be an extracted model

    result = interpreter.parse(msg1,only_output_properties=False)

    return func.HttpResponse(

Model Trained on ---------------------------------------------------------- >

language: en

- name: HFTransformersNLP
  model_weights: "bert-base-uncased"
  model_name: "bert"
- name: LanguageModelTokenizer            # splits the sentence into tokens
- name: LanguageModelFeaturizer

- name: DIETClassifier

You should be running Rasa in a separate container that is constantly running and call the REST or socket channel from your Azure function.

I tried using LRU_Cache and on a dedicated machine it is working fine. Replying within a second but on Azure, process is closed between 5 to 10 minutes. I guess it is due to the plan that we have for Azure Function. Change of plan to have dedicated Azure resources or cache can solve the purpose. Thanks!

Looks like for every request you are trying to load the model and then doing inference. You can load your model outside your api endpoint i.e Load the model at global level , so that whenever the server is started you load the model and keeps the model in memory.