How to truncate sequence length to maximum value

isidorog · November 10, 2022, 3:46pm

Hi!

I am a new Rasa learner and I am trying to train a new NLU Rasa model using a simple pipeline and I would like to include on it the BERT LanguageModelFeaturizer. To do so I know that I should include the following snipped of code in my config.yml pipeline (documentation ==> Components):

  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "rasa/LaBSE"
    cache_dir: ./.cache

but at run time I get the following error:

“RuntimeError: The sequence length of ‘Hi, bla bla bla…’ is too long(514 tokens) for the model chosen Bert which has a maximum sequence length of 512 tokens. Either shorten the message or use a model which has no restriction on input sequence length like XLNet.”

So, I was wondering whether (and in case how) it is possible to hand-over additional hyperparameters to truncate the text to the maximum input sequence length (e.g. 512 tokens) in case the text is longer than the expected. Every hint will be appreciated

.

For further details, below an example of my current pipeline defined in config.yml file:

...
pipeline:
  - name: SpacyNLP
    model: "en_core_web_md"
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: LanguageModelFeaturizer
    model_name: "bert"
    model_weights: "rasa/LaBSE"
    cache_dir: ./.cache
...

Hope someone could help me

Thanks in advance,

Isidoro

Topic		Replies	Views
How to specify input sequence length in rasa NLU pipeline? Rasa Open Source	2	496	June 24, 2020
Length of utterance for training NLU model Rasa Open Source	0	569	August 8, 2018
Can't train the model as story gets long Rasa Open Source	2	280	October 18, 2022
Rasa 3.0 Error on train model (LanguageModelFeaturizer , bert) Rasa Open Source	1	966	April 1, 2022
Support for Language Models inside Rasa Release Announcements community , rasa	25	12824	November 25, 2021

How to truncate sequence length to maximum value

Related topics