Support for Language Models inside Rasa

koaning · November 24, 2021, 10:00am

Oh yeah, it’s totally possible to write your own models. In fact, there’s plenty of examples over at rasa-nlu-examples. There’s also some examples of custom classifiers and featurizers in there. Note though that right now we’re going to be transitioning these components to Rasa 3.x. The latest release for Ras 2.x is found here.

A few caveats though.

Huggingface featurizers are natively supported already. These are supported via LanguageModelFeaturizer.
Usually, you should delay custom components. Typically the most pressing thing when you’re building an assistant is the data that you’re learning on. The DIET architecture is pretty good at picking up many patterns from many languages and I wouldn’t worry too much about an optimal pipeline unless you have a large representative dataset.

thisisishara · November 24, 2021, 11:08am

@koaning If I want to attach the xlm-roberta-base to the pipeline via LanguageModelFeaturizer, is it possible? If so, can you please explain a bit on how I can do that? I am sorry but in the documentation I was only able to find bert, gpt, gpt2, xlnet, distilbert, and roberta based models, that’s why I had to ask. (if I want to add xlm-roberta-base model, what should be the “model_name” and “model_weights” 'cause there are no defaults given for those for xml-roberta-base in rasa documentation)

… and thank you very much for all the info. That helps a lot.

koaning · November 25, 2021, 8:41am

Just to confirm, in the huggingface section of the Non-English NLU blogpost there’s this snippet.

- name: LanguageModelFeaturizer
  model_name: bert
  model_weights: asafaya/bert-base-arabic

The idea is that a bert-kind of huggingface model can be used in Rasa but that you’ll need to give it appropriate weights. Am I understanding it correctly that xml-roberta-base refers to a non-roberta model?

It’d help if you could share the config.yml file that you tried to run.

thisisishara · November 25, 2021, 9:35am

@koaning, Adding bert based models works just fine. I’ve tried it with the following config.

language: si

pipeline:
  - name: "HFTransformersNLP"
    model_name: "roberta"
    model_weights: "keshan/SinhalaBERTo"
    cache_dir: "hf_lm_weights/bert_si"
  - name: "LanguageModelTokenizer"
  - name: "LanguageModelFeaturizer"
  - name: "LexicalSyntacticFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "CountVectorsFeaturizer"
    analyzer: "char"
    min_ngram: 3
    max_ngram: 5
  - name: "DIETClassifier"
    entity_recognition: true
    epochs: 300
  - name: "EntitySynonymMapper"
  - name: "ResponseSelector"
    epochs: 300
    retrieval_intent: faq

policies:
  - name: RulePolicy

My question is that is it possible to attach xml-roberta-base model in the same way? If I want to add it to the pipeline via LanguageModelFeaturizer, how do I have to specify model_name and model_weights? That’s where I’m stuck because I couldn’t find those parameters in the documentaion for xml-roberta based models.

koaning · November 25, 2021, 9:54am

From the top of my head; the xml-roberta-base would refer to the weights and the architecture/model_name would be roberta.

thisisishara · November 25, 2021, 12:21pm

Right! I’ll see if that works. I thought they are different. @koaning Thanks a lot for the help.

Topic		Replies	Views
Correct tokenizer for BERT Rasa/LaBSE Rasa Open Source	2	264	January 17, 2025
I need a Albert in LanguageModelFeature Rasa Open Source	16	1591	January 3, 2022
How to import huggingface models to Rasa? Rasa Open Source	12	4845	December 27, 2021
Using BERT with RASA Rasa Open Source	10	7097	September 9, 2020
Clarification on Model Weights Getting Started with Rasa	2	329	November 23, 2020

Support for Language Models inside Rasa

Related topics