Support for Language Models inside Rasa

Oh yeah, it’s totally possible to write your own models. In fact, there’s plenty of examples over at rasa-nlu-examples. There’s also some examples of custom classifiers and featurizers in there. Note though that right now we’re going to be transitioning these components to Rasa 3.x. The latest release for Ras 2.x is found here.

A few caveats though.

  1. Huggingface featurizers are natively supported already. These are supported via LanguageModelFeaturizer.
  2. Usually, you should delay custom components. Typically the most pressing thing when you’re building an assistant is the data that you’re learning on. The DIET architecture is pretty good at picking up many patterns from many languages and I wouldn’t worry too much about an optimal pipeline unless you have a large representative dataset.
1 Like

@koaning If I want to attach the xlm-roberta-base to the pipeline via LanguageModelFeaturizer, is it possible? If so, can you please explain a bit on how I can do that? I am sorry but in the documentation I was only able to find bert, gpt, gpt2, xlnet, distilbert, and roberta based models, that’s why I had to ask. (if I want to add xlm-roberta-base model, what should be the “model_name” and “model_weights” 'cause there are no defaults given for those for xml-roberta-base in rasa documentation)

… and thank you very much for all the info. That helps a lot.

Just to confirm, in the huggingface section of the Non-English NLU blogpost there’s this snippet.

- name: LanguageModelFeaturizer
  model_name: bert
  model_weights: asafaya/bert-base-arabic

The idea is that a bert-kind of huggingface model can be used in Rasa but that you’ll need to give it appropriate weights. Am I understanding it correctly that xml-roberta-base refers to a non-roberta model?

It’d help if you could share the config.yml file that you tried to run.

1 Like

@koaning, Adding bert based models works just fine. I’ve tried it with the following config.

language: si

pipeline:
  - name: "HFTransformersNLP"
    model_name: "roberta"
    model_weights: "keshan/SinhalaBERTo"
    cache_dir: "hf_lm_weights/bert_si"
  - name: "LanguageModelTokenizer"
  - name: "LanguageModelFeaturizer"
  - name: "LexicalSyntacticFeaturizer"
  - name: "CountVectorsFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "CountVectorsFeaturizer"
    analyzer: "char"
    min_ngram: 3
    max_ngram: 5
  - name: "DIETClassifier"
    entity_recognition: true
    epochs: 300
  - name: "EntitySynonymMapper"
  - name: "ResponseSelector"
    epochs: 300
    retrieval_intent: faq

policies:
  - name: RulePolicy

My question is that is it possible to attach xml-roberta-base model in the same way? If I want to add it to the pipeline via LanguageModelFeaturizer, how do I have to specify model_name and model_weights? That’s where I’m stuck because I couldn’t find those parameters in the documentaion for xml-roberta based models.

1 Like

From the top of my head; the xml-roberta-base would refer to the weights and the architecture/model_name would be roberta.

2 Likes

Right! I’ll see if that works. I thought they are different. @koaning Thanks a lot for the help.

1 Like