How to import huggingface models to Rasa?

Hi @ganbaa_elmer, I think the error you’re seeing comes from the way how the model name and weights are mapped to the corresponding Huggingface classes. I tested this with Rasa version 3.0.2. and config

- name: LanguageModelFeaturizer
  model_name: bert
  model_weights: tugstugi/bert-base-mongolian-uncased

and am getting an error as well. If you’re using a different Rasa version the concrete reason might be different though.

According to here, if you specify model: bert, Rasa tries to initialize a BertTokenizer from the given weights (in your case tugstugi/bert-base-mongolian-uncased). However, when checking which kind of tokenizer is actually used by this model directly in HF transformers using

from transformers import AutoTokenizer
tok = AutoTokenizer.from_pretrained("tugstugi/bert-base-mongolian-uncased")
print(type(tok))

you get

<class 'transformers.models.albert.tokenization_albert_fast.AlbertTokenizerFast'>

Therefore there seems to be a mismatch between the tokenizer that the model uses and the one Rasa is trying to load. Since the mapping from model to tokenizer class is hard-coded, I think it is currently only possible to use Bert models that also make use of the BertTokenizer. This is not transparent from the documentation and hard to see on the HF model hub. I would suggest to open a ticket to improve the documentation on that.

As an alternative, if you’re looking for dense embeddings in Mongolian, you could also try using the BytePairFeaturizer from rasa-nlu-examples, which has a Mongolian model of dense sub-word embeddings. See here for installation and usage instructions.