Hello,
I have been involved in the development of conversational agents for a long time both at the university and now in my company. Namely, I started making conversational agents with version 2.0, at that time I wasn’t monitoring my memory that much. Now that I’m working for myself and we’re making a product for one company that includes speech technologies related to a robot and we have 50 conversational agents built for them, I’m interested in how, if there is any possibility, we can reduce the consumption of models. I am using the default pipeline and rasa version 3.3. I wonder if by changing the batch size I can change this. Currently, each model consumes between 800MB-1GB RAM depending on the size of the model.
I also have a question, because it seems from my testing that the learning of the rasa models takes place on the cores, and I’m wondering if I can use some command to limit this or choose how many cores I want the rasa to occupy during training or startup. If, of course, anyone knows the answers to these questions, I would be very happy and maybe clarify some things for me.
Reducing the memory consumption of Rasa models and controlling CPU core usage during training or startup can be essential for efficient conversational agent development. Here are some insights based on your questions:
Reducing Memory Consumption:
You mentioned that each model consumes between 800MB-1GB of RAM. One approach to reduce memory usage is to consider using smaller, more efficient models. Rasa allows you to customize the pipeline and choose smaller language models if possible.
Additionally, you can experiment with reducing the batch size during training. Smaller batch sizes generally consume less memory, but they might affect training time and accuracy, so it’s a trade-off to consider.
Controlling CPU Core Usage:
Rasa itself doesn’t provide built-in options to limit the number of CPU cores it uses during training or startup. However, you can manage CPU core usage at the system level.
On Linux, you can use tools like “taskset” to control which CPU cores a process can run on. On Windows, you can use Task Manager to set CPU affinity for a specific process.
Be cautious when adjusting CPU core affinity, as it can impact system stability and performance. It’s advisable to test these changes in a controlled environment.
Additionally, the Rasa documentation may provide updates and best practices for managing memory and training parameters in newer versions.
Remember that optimizing memory and CPU usage is a delicate balance between model size, training time, and system performance. Experimentation and testing with different configurations are key to finding the right setup for your conversational agents.