I have a custom rasa chatbot in the Spanish language with the Spacy model and EmbeddingIntentClassifier also the chatbot has a KerasPolicy with LSTM. My problem is the model uses all of the memory of the GPU for inference. I look at solutions for restricting memory growth and found this page Use a GPU | TensorFlow Core but I don’t know how to implement these solutions in rasa.
I appreciate your help.