How to import huggingface models to Rasa?

ganbaa_elmer · December 23, 2021, 7:49am

language: mn

pipeline:
  - name: HFTransformersNLP

    model_name: tugstugi/bert-base-mongolian-uncased

    model_weights: tugstugi/bert-base-mongolian-uncased
  - name: LanguageModelTokenizer
  - name: LanguageModelFeaturizer
  - name: DIETClassifier
    epoch: 100

policies:
  - name: RulePolicy

its doesnt work?

model URL: tugstugi/bert-base-mongolian-uncased · Hugging Face

ganbaa_elmer · December 23, 2021, 8:12am

i also tried:

language: mn

pipeline:
  - name: HFTransformersNLP

    model_name: roberta

    model_weights: bayartsogt/mongolian-roberta-base
  - name: LanguageModelTokenizer
  - name: LanguageModelFeaturizer
  - name: DIETClassifier
    epoch: 100

policies:
  - name: RulePolicy
``
its doesnt work

model URL: https://huggingface.co/bayartsogt/mongolian-roberta-base

ganbaa_elmer · December 23, 2021, 9:55am

MatthiasLeimeister · December 23, 2021, 10:36am

Hi @ganbaa_elmer, I think the error you’re seeing comes from the way how the model name and weights are mapped to the corresponding Huggingface classes. I tested this with Rasa version 3.0.2. and config

- name: LanguageModelFeaturizer
  model_name: bert
  model_weights: tugstugi/bert-base-mongolian-uncased

and am getting an error as well. If you’re using a different Rasa version the concrete reason might be different though.

According to here, if you specify model: bert, Rasa tries to initialize a BertTokenizer from the given weights (in your case tugstugi/bert-base-mongolian-uncased). However, when checking which kind of tokenizer is actually used by this model directly in HF transformers using

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("tugstugi/bert-base-mongolian-uncased")
print(type(tok))

you get

<class 'transformers.models.albert.tokenization_albert_fast.AlbertTokenizerFast'>

Therefore there seems to be a mismatch between the tokenizer that the model uses and the one Rasa is trying to load. Since the mapping from model to tokenizer class is hard-coded, I think it is currently only possible to use Bert models that also make use of the BertTokenizer. This is not transparent from the documentation and hard to see on the HF model hub. I would suggest to open a ticket to improve the documentation on that.

As an alternative, if you’re looking for dense embeddings in Mongolian, you could also try using the BytePairFeaturizer from rasa-nlu-examples, which has a Mongolian model of dense sub-word embeddings. See here for installation and usage instructions.

ganbaa_elmer · December 23, 2021, 10:45am

Problem is i have Rasa 2.x because i need to use Rasa X. Rasa X currently not supporting Rasa 3.x

My rasa:

Rasa Version : 2.8.16

Minimum Compatible Version: 2.8.9

Rasa SDK Version : 2.8.3

Rasa X Version : 0.42.6

Python Version : 3.8.12

Operating System : Linux-5.11.0-41-generic-x86_64-with-glibc2.17

How about those models:

please check it for me on your Rasa 3.X

MatthiasLeimeister · December 23, 2021, 11:07am

This should be independent of Rasa 2 vs. Rasa 3, since the way the Huggingface models are integrated did not change afaik. The ones you listed did not work for me unfortunately, either because they use a different tokenizer than the standard mapped one, or they don’t have a pretrained Tensorflow model available (which Rasa is using), just PyTorch.

However, have you already tried using the default multi-lingual Bert model and tested it for your use case?

- name: LanguageModelFeaturizer
  model_name: "bert"
  model_weights: "rasa/LaBSE"

This is trained on 112 languages, including Mongolian according to the original paper.

ganbaa_elmer · December 23, 2021, 11:13am

Yes i have tried LaBSE but performance is very bad on Mongolian language

ganbaa_elmer · December 23, 2021, 11:08am

language: mn

pipeline:

* name: HFTransformersNLP

  model_name: roberta

  model_weights: bayartsogt/mongolian-roberta-base

* name: LanguageModelTokenizer
* name: LanguageModelFeaturizer
* name: DIETClassifier epoch: 100

policies:

* name: RulePolicy

its doesnt work?

model URL: bayartsogt/mongolian-roberta-base · Hugging Face

MatthiasLeimeister · December 23, 2021, 4:27pm

Hi, I wasn’t able to reproduce the exact same error, but got a different one - which most likely happens because the model has no Tensorflow weights included, just PyTorch (as can be seen in the tags on the HF model website). There is already a ticket opened here on improving the documentation of LanguageModelFeaturizer, so it becomes clearer which HF models can be included and how.

In the meantime, in order for you to be able to continue, you could try out the BytePairFeaturizer mentioned above in your pipeline, which will also provide dense subword embeddings in Mongolian.

ganbaa_elmer · December 24, 2021, 2:22am

so if i upload Tensorflow weights to Hugging face then it will work?

Also pyte pair embedding is useless for my case. Because i need context of the sentence, not words or keywords.

So what should i do ?. Need to wait fix bug by Rasa developers?

ganbaa_elmer · December 24, 2021, 2:23am

@ChrisRahme please tell this problem to RASA developer team

ChrisRahme · December 24, 2021, 10:06am

I don’t work at Rasa, I’m just a volunteer moderator on the forum

If you’re sure this is an issue from Rasa and not from your side, you can open an issue on GitHub. If you do so, please post the link to that issue here so that people can follow it in the future.

ganbaa_elmer · December 27, 2021, 3:11am

@koaning please tell this problem to RASA developer team

Topic		Replies	Views
Support for Language Models inside Rasa Release Announcements community , rasa	25	12759	November 25, 2021
Can't load bert German model from huggingface Rasa Open Source	2	2449	June 13, 2022
Hugging Face custom Tokenizer Rasa Open Source	2	335	March 26, 2024
Using BERT with RASA Rasa Open Source	10	7114	September 9, 2020
Clarification on Model Weights Getting Started with Rasa	2	334	November 23, 2020

How to import huggingface models to Rasa?

Related topics