Hey all… If we want to load a rasa model which uses Rasa/LaBSE as weights, how much memory(RAM) is required ?
Hi @souvikg10… any help ?
Hi, Consider that LaBSE weights are 2Gb in size, the very minimum you need to run it will be 2-2.5Gb
hi @souvikg10 Thanks for the reply. I trained the model in ec2 instance where the RAM is 16 GB and tried to load the same model for inference in some other machine locally with GPU with ram 16 GB… But the usage went till a maximum of 12 GB at some point. Any idea why this is happening ?
it kind of depends on the load of your bot as well, is it only due to LaBSE?
@souvikg10 Hi… Yes it’s only when i load LaBSE. During the model loading memory usage is going till 34% of 16 GB(aroung 5.5 GB). And once the model is loaded, it’s taking up 19% which is around 3 GB. Is there any way we can restrict this ? Because my model training happens in some big server and deployment happens in production server where I have 4 GB of RAM. Any solution for this ?
The weights are about 2Gb in size, loading them with tensorflow will use the memory. have you tried other transformers that are smaller in size?
You can potentially finetune it using SBert and use pytorch instead but you would have to make a featurizer yourself and it won’t work with DIET if you are using DIET as a classifier.
Pytorch is a bit clever with memory management compared to tensorflow
@souvikg10 Thanks for reply. I wanted to deploy LaBSE because it’s giving good results on my dataset. I tried bert-uncased-base and bert-uncased-large… but they are not performing good. Is there any reference to using Sbert ?
Bigger models require more memory. You cannot fit into 4gb machine.
That being said, pytorch’s memory footprint might be smaller. i haven’t tested it. however the following pipeline can work for you.
from sentence_transformers import SentenceTransformer loaded_weights = SentenceTransformer("rasa/LaBSE") features = loaded_weights.encode( texts, show_progress_bar=show_progress_bar, batch_size=self.batch_size, )
you can then feed these features to a classifier such as LogReg. It is quite performant and potentially more memory efficient than tensorflow
Hi @souvikg10 Thanks for the suggestion. I will definitely try this out. But I want to know one more thing here. I found a lighter version of rasa/LaBSE on hugging face cointegrated/LaBSE-en-ru, which only contains support for 2 languages and able to load it properly. Can I use this repository for my production ? Will there be any problem ?
Any reason you want a multilingual base model? There are plenty of sentence embeddings models out there.
I would still recommend trying LaBSE with sbert instead of distilled LaBSE
@imnaren142, hi! I had a similar problem. During my experiments I realized, that I can limit my container memory by model size + some default memory used by container and not get OOM. Yes, if memory smaller than in previous formula I get OOM. Maybe its only my case…