I hope this message finds you well. I am currently facing a challenge related to memory usage and featurizer selection in my Rasa NLU pipeline, and I am reaching out to seek your valuable insights and expertise.
I have a substantial dataset comprising 19 intents, each with 30,000 examples, totaling 570,000 training examples. My system is equipped with 45 GB of RAM and a powerful 32 GB Tesla V100s GPU. Despite these resources, I am encountering out-of-memory (OOM) issues during the training process, particularly when using the DIETClassifier component was just started.
Additionally, my vocabulary size is substantial, with 2,185,209 vocabulary items created for the text attribute during training. NOTE: " This i grabbed from the logs during training : rasa.nlu.featurizers.sparse_featurizer.count_vectors_featurizer - 2185209 vocabulary items were created for text attribute. ’ I have experimented with various featurizers, including CountVectorsFeaturizer, SpacyFeaturizer, ConveRTFeaturizer, and others, but I am struggling to find the right balance between memory efficiency and feature richness.
I am seeking advice on the following aspects:
- Memory Optimization: Are there specific techniques or configurations I can implement to reduce memory usage during training? I have already experimented with batch sizes and model complexity.
- Featurizer Selection: Considering the complexity of my dataset, which featurizer(s) would be most suitable to extract informative features while optimizing memory usage? I have explored options like SpacyFeaturizer, CountVectorsFeaturizer, among others.
- Vocabulary Size Management: My substantial vocabulary size is a significant contributor to memory usage. Are there effective strategies to manage vocabulary size without compromising the quality of the model?
I would greatly appreciate any advice, best practices, or experiences you can share regarding similar challenges you have faced or solutions you have implemented. Your expertise will be immensely helpful in guiding me toward an effective resolution.
Thank you in advance for your time and assistance. I am looking forward to learning from your experiences and insights. I’ve also attached the pipeline configurations!
Best regards