How to truncate sequence length to maximum value

Hi!

I am a new Rasa learner :slight_smile: and I am trying to train a new NLU Rasa model using a simple pipeline and I would like to include on it the BERT LanguageModelFeaturizer. To do so I know that I should include the following snipped of code in my config.yml pipeline (documentation ==> Components):

  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "rasa/LaBSE"
    cache_dir: ./.cache

but at run time I get the following error:

ā€œRuntimeError: The sequence length of ā€˜Hi, bla bla bla…’ is too long(514 tokens) for the model chosen Bert which has a maximum sequence length of 512 tokens. Either shorten the message or use a model which has no restriction on input sequence length like XLNet.ā€

So, I was wondering whether (and in case how) it is possible to hand-over additional hyperparameters to truncate the text to the maximum input sequence length (e.g. 512 tokens) in case the text is longer than the expected. Every hint will be appreciated :slight_smile:

.

For further details, below an example of my current pipeline defined in config.yml file:

...
pipeline:
  - name: SpacyNLP
    model: "en_core_web_md"
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "rasa/LaBSE"
    cache_dir: ./.cache
...

Hope someone could help me :slight_smile:

Thanks in advance,

Isidoro