Possible memory leak

I’ve deployed the demo chatbot at a certain endpoint and am using locust to stress test it. I have a single user (with 1 hatch rate) and am sending the message “hello” to the above endpoint to benchmark the application. Each time the server receives a request, the NLP engine tries to classify the intent and then responds back with an appropriate reply.

Here’s my locust script:

class LoadCreate(HttpUser):
    wait_time = between(0.5, 0.51)

    @task
    def sayHello(self):
        payload = {"sender": self.name, "message": "Hello!"}
        response = self.client.post(apiUrl, json=payload)
        print(response.text)

The results from locust are quite abnormal to my understanding. Even when the number of users is constant (1), the response time keeps on increasing. Is there any explanation for this behaviour? I’m attaching the obtained plots herewith.

image

I tried hitting the NLU server (/model/parse) to find the root of this behaviour. Seems like rasa core is responsible for it Below plot shows two runs. First one (on the left) is the response time when only NLU is at play. Run 2 (right) shows the response time when both NLU and core are running.

total_requests_per_second_1657688198