RASA and camemBERT

Hi everyone ! I wanted to know if there was a way to implement the french version of BERT (camembert). It seems how were implemented the other bert version (rasa.nlu.utils.hugging_face) and wanted to know if anyone tried to play with these files to add a version of BERT !

Thank you very much !

Why not use your own nlu service to provide http ports?

You can use rasa core alone and use your nlu service to parse text.

Thanks for your answer ! I wanted to beneficiate from the train process of rasa :slight_smile:

Solved ! I heard there was some tries on camemBERT here and looked for these traces

There is no need to modify the rasa[transformers] modules from the pip installations, only rasa source code

  1. Monkeypatch the dictionary from the modeling_tf_camembert to provide the link to the camemBERT model : add the following lines at the beggining of the hf_transformers.py file (rasa/nlu/utils/hugging_face)

from transformers import modeling_tf_camembert modeling_tf_camembert.TF_CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_MAP[“camembert-base”]=“https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-camembert-base/tf_model.h5

  1. Add camemBERT to the registry.py file (path : rasa/nlu/utils/hugging_face/registry.py). Here is an example : registry.py (3.0 KB)

  2. config.py file :

pipeline:

  • name: HFTransformersNLP model_name: “camembert” model_weights: “camembert-base” cache_dir: “data/cache_membert”
  • name: “LanguageModelTokenizer”
  • name: “LanguageModelFeaturizer”
  • name: “DIETClassifier”
3 Likes

Hi,

I am actually working with CamemBERT and was wondering if you had succeeded in creating a french chatbot based on this model? If you did, are the results good?

Thx.

Hi !

It is still in development but the results are better than I expected. The NLU I used consists of 27 intents for an average of around 10 examples per intent

When the inputs were really close to an example given in the nlu.md file, the model always predicted the good intent with a confidence close to 1. Moreover, when a word is not specified in your NLU but semantically close to a word you specified, the model often predicted the good intent with a lower confidence (around 0.50) ; but it can make a mistake between two close intents (what_time_is_it / what_day_is_it)

I hope it helps !

Sounds cool I’ll try to do my own chatbot too, thank you very much! :slight_smile:

1 Like

Maybe you should create an intent “What is it”, and switch by entities your response for time, day, or another thing… :wink:

Your two intentions are very close :slight_smile:

I’m back on that CamemBERT thing.

So I tried to do my own bot but can’t do anything when I’m trying to use CamemBERT model. I do follow what you said but when I try to train I got errors:

File “_ruamel_yaml.pyx”, line 706, in _ruamel_yaml.CParser.get_single_node
File “_ruamel_yaml.pyx”, line 724, in _ruamel_yaml.CParser._compose_document
File “_ruamel_yaml.pyx”, line 775, in _ruamel_yaml.CParser._compose_node
File “_ruamel_yaml.pyx”, line 891, in _ruamel_yaml.CParser._compose_mapping_node
File “_ruamel_yaml.pyx”, line 904, in _ruamel_yaml.CParser._parse_next_event
ruamel.yaml.parser.ParserError: while parsing a block mapping
in “”, line 3, column 1
did not find expected key
in “”, line 8, column 1

I don’t understand what is wrong. Are you sure the followin line is working?

modeling_tf_camembert.TF_CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_MAP[“camembert-base”]=…

Cause when I try it on Colab it doesn’t.

I’d love to use the french model for a bot, could you help me? Thx

Solved !!!

Finally I managed to load the model following your instructions.

To be more precise:

  1. The first step is working well

  2. in the registry.py, there is a sneaky TFCamembertModel, that I didn’t see and needed to add it

  3. Indentation of the config file must be perfect (tricky error…) :

language: fr  
pipeline: 
    - name: HFTransformersNLP  
      model_name: "camembert"  
      model_weights: "camembert-base"  
      cache_dir: "data/cache_membert"
    - name: "LanguageModelTokenizer"  
    - name: "LanguageModelFeaturizer"  
    - name: "DIETClassifier"  
      epochs: 200  

Hope it will help and thanks again to @Zoukero

4 Likes

Thanks to share that u success to use it ! :smiley:

I want to use camemBERT but I have some issue, I did apply this solution unfortunately it is out dated I guess did anyone used camemBERT recently ?

downgrade Transformers library, I guess it should be some version around 2.5.X

1 Like

I downgraded the transformers library and it worked

I did this pip install --upgrade transformers==2.5.0

thanks @zack for you answer :slight_smile:

1 Like

It worked ! but additionally to the steps described above I had to add camembert to the dict MAX_SEQUENCE_LENGTHS in rasa/nlu/utils/hugging_face.py

MAX_SEQUENCE_LENGTHS = { “camembert”: 512, “bert”: 512, “gpt”: 512, “gpt2”: 512, “xlnet”: NO_LENGTH_RESTRICTION, “distilbert”: 512, “roberta”: 512, }

1 Like