High memory usage when using RASA NLU as an HTTP Server

I am using currently using two RASA HTTP servers with the spacy_sklearn pipeline

Both servers have around 4-6 projects each ( each project only has one model).

It seems like each time I make a call to a new project spaCy vectors are getting loaded into the memory. The reasons for my suspicion are

  • The first time that I query a new project it take ~ 1 minute to respond
  • Also my RAM Usage increases by ~1 GB with each such model

With all the models loaded, my RAM (8GB) overflows into the swap space and basically my computer become unusable.

I remember that when running an interpreter from python there used to be component builder option which would cache spaCy vectors between different interpreters. Why does that not happen automatically with the HTTP Server?

1 Like

@tmbo any ideas about this?

That should happen with the HTTP server as well. But I think the server might use multiple processes, so it will load the model once in every process. If you hit the server with train calls, that will also load the space vectors again as they are run in a process pool.

1 Like

So what you suggest I do? I tried switching to the tf pipeline but it mis-classifies more often due to the limited number of training examples.