Hi!
I am a new Rasa learner and
I am trying to train a new NLU Rasa model using a simple pipeline and I would like to include on it the BERT LanguageModelFeaturizer.
To do so I know that I should include the following snipped of code in my config.yml pipeline (documentation ==> Components):
- name: LanguageModelFeaturizer
model_name: "bert"
model_weights: "rasa/LaBSE"
cache_dir: ./.cache
but at run time I get the following error:
āRuntimeError: The sequence length of āHi, bla bla blaā¦ā is too long(514 tokens) for the model chosen Bert which has a maximum sequence length of 512 tokens. Either shorten the message or use a model which has no restriction on input sequence length like XLNet.ā
So, I was wondering whether (and in case how) it is possible to hand-over additional hyperparameters to truncate the text to the maximum input sequence length (e.g. 512 tokens) in case the text is longer than the expected. Every hint will be appreciated
.
For further details, below an example of my current pipeline defined in config.yml file:
...
pipeline:
- name: SpacyNLP
model: "en_core_web_md"
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: LanguageModelFeaturizer
model_name: "bert"
model_weights: "rasa/LaBSE"
cache_dir: ./.cache
...
Hope someone could help me
Thanks in advance,
Isidoro