Unable to download huggingface model

Hello,

I use Rasa v2.0.2

I have an error when i run rasa train nlu. Rasa can’t download huggingface model. I have this error : OSError: Can’t load tokenizer for ‘camembert-base’. If you were trying to load it from ‘Models - Hugging Face’, make sure you don’t have a local directory with thesame name. Otherwise, make sure ‘camembert-base’ is the correct path to a directory containing all relevant files for a BertTokenizer tokenizer. 2022-09-13 16:08:09 WARNING urllib3.connectionpool - Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by ‘ProxyError(‘Cannot connect to proxy.’, OSError(‘Tunnel connection failed: 407 Proxy Authentication Required’,))’: /api/2801673/store/

However, when I run the transformers command directly, I have no problem downloading the model.

from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained(“camembert-base”) Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28.0/28.0 [00:00<00:00, 26.4kB/s] Downloading: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [00:00<00:00, 437kB/s] Downloading: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 226k/226k [00:00<00:00, 664kB/s] Downloading: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 455k/455k [00:00<00:00, 1.09MB/s]

I have not found in the docs any specific proxy settings for Rasa. Has anyone experienced this problem?

my conf :

language: fr

pipeline:

  • name: HFTransformersNLP
    model_name: “bert”
    model_weights: “camembert-base”
    cache_dir: /xxx/yyyy/.cache # required with Botfront
  • name: LanguageModelTokenizer
  • name: LanguageModelFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
  • name: DIETClassifier

Hi! Where you able to solve this problem?

Hello,

I manage to make this work also I had the same problem originally. The config I am using now is:

pipeline:
- name: SpacyNLP
  model: "fr_core_news_lg" #python -m spacy download fr_core_news_lg 
- name: SpacyTokenizer
- name: CountVectorsFeaturizer
  analyzer: word
  OOV_token: oov
  strip_accents: ascii
- name: LanguageModelFeaturizer
  #model_name: "bert"
  #model_weights: "rasa/LaBSE"
  #cache_dir: "./LaBSE"
  model_name: "camembert"
  model_weights: "camembert-base"
  cache_dir: "./camembert"
- name: DIETClassifier
  intent_tokenization_flag: true
  intent_split_symbol: +
  epochs: 100
  constrain_similarities: true
- name: EntitySynonymMapper

As you can see, I have tested the “rasa/LaBSE” and after came back to camembert. Then it worked. :man_shrugging:

Let me know. Cheers. Camille