Shared embeddings

Hello, I am using rasa for NER part ,I have trained two models using BERT embeddings in config.yml file

One model is for Hindi language and other model is for English Language but both uses the same BERT multilingual uncased pretrained embeddings.

Right now when I load both the models using ----> rasa run --enable-api both the models uses separate BERT embeddings and takes up alot of RAM.

My question is —Is there any way to make rasa different models use shared embeddings ?i.e models are different but embeddings are loaded only once ??

please help me in this out, any help would be appreciable

@Anvesh - you can use the cache_dir to your config, here is an example from docs. place the same directory for both the configs for saving the model weight. this way both pipeline will try to pick the weights from a shared location.

pipeline:
  - name: LanguageModelFeaturizer
    # Name of the language model to use
    model_name: "bert"
    # Pre-Trained weights to be loaded
    model_weights: "rasa/LaBSE"

    # An optional path to a directory from which
    # to load pre-trained model weights.
    # If the requested model is not found in the
    # directory, it will be downloaded and
    # cached in this directory for future use.
    # The default value of `cache_dir` can be
    # set using the environment variable
    # `TRANSFORMERS_CACHE`, as per the
    # Transformers library.
    cache_dir: null
1 Like

@souvikg10 I tried the same way you said

   - name: "HFTransformersNLP" 
     model_weights: "bert-base-multilingual-cased" 
     model_name: "bert"
     cache_dir: /home/temp/
   - name: LanguageModelTokenizer

For both the models I kept this same config file and I trained both the models . It is loading from same cache_dir but still load embeddings separately , Both are still taking 2gb each .

Let me know if any corrections are required

yeah this solution, let them share a common disk storage so you won’t have to download or double them for each run.

if you want to share the embeddings, you would need to instantiate the HF model by loading and running the training for both in a single process. I think currently when the Component for LanguageModelTokenizer or Featurizer is initialized, models is loaded in a class object thus each training job initiates a new class instance.

What you can do is replicate LanguageModel Featurizer as a custom component, then instead of loading the HF model as a class instance, loading it as global object or an InMemory Cache so you would reuse it in your python process but it would still mean that your training should run within a single process instead of running rasa train twice for each config.

I am not sure if there is any other out of the box solution for this.