RASA and camemBERT

Zoukero · July 27, 2020, 3:05pm

Hi everyone ! I wanted to know if there was a way to implement the french version of BERT (camembert). It seems how were implemented the other bert version (rasa.nlu.utils.hugging_face) and wanted to know if anyone tried to play with these files to add a version of BERT !

Thank you very much !

MingGuangShao · July 27, 2020, 3:51pm

Why not use your own nlu service to provide http ports？

You can use rasa core alone and use your nlu service to parse text.

Zoukero · July 28, 2020, 7:57am

Thanks for your answer ! I wanted to beneficiate from the train process of rasa

Zoukero · July 28, 2020, 4:04pm

Solved ! I heard there was some tries on camemBERT here and looked for these traces

There is no need to modify the rasa[transformers] modules from the pip installations, only rasa source code

Monkeypatch the dictionary from the modeling_tf_camembert to provide the link to the camemBERT model : add the following lines at the beggining of the hf_transformers.py file (rasa/nlu/utils/hugging_face)

from transformers import modeling_tf_camembert modeling_tf_camembert.TF_CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_MAP[“camembert-base”]=“https://s3.amazonaws.com/models.huggingface.co/bert/jplu/tf-camembert-base/tf_model.h5”

Add camemBERT to the registry.py file (path : rasa/nlu/utils/hugging_face/registry.py). Here is an example : registry.py (3.0 KB)
config.py file :

pipeline:

name: HFTransformersNLP model_name: “camembert” model_weights: “camembert-base” cache_dir: “data/cache_membert”
name: “LanguageModelTokenizer”
name: “LanguageModelFeaturizer”
name: “DIETClassifier”

kasar3 · August 10, 2020, 1:01pm

Hi,

I am actually working with CamemBERT and was wondering if you had succeeded in creating a french chatbot based on this model? If you did, are the results good?

Thx.

Zoukero · August 10, 2020, 3:04pm

Hi !

It is still in development but the results are better than I expected. The NLU I used consists of 27 intents for an average of around 10 examples per intent

When the inputs were really close to an example given in the nlu.md file, the model always predicted the good intent with a confidence close to 1. Moreover, when a word is not specified in your NLU but semantically close to a word you specified, the model often predicted the good intent with a lower confidence (around 0.50) ; but it can make a mistake between two close intents (what_time_is_it / what_day_is_it)

I hope it helps !

kasar3 · August 11, 2020, 7:37am

Sounds cool I’ll try to do my own chatbot too, thank you very much!

MaxGdr · August 28, 2020, 8:11am

Maybe you should create an intent “What is it”, and switch by entities your response for time, day, or another thing…

Your two intentions are very close

kasar3 · September 1, 2020, 3:16pm

I’m back on that CamemBERT thing.

So I tried to do my own bot but can’t do anything when I’m trying to use CamemBERT model. I do follow what you said but when I try to train I got errors:

File “_ruamel_yaml.pyx”, line 706, in _ruamel_yaml.CParser.get_single_node
File “_ruamel_yaml.pyx”, line 724, in _ruamel_yaml.CParser._compose_document
File “_ruamel_yaml.pyx”, line 775, in _ruamel_yaml.CParser._compose_node
File “_ruamel_yaml.pyx”, line 891, in _ruamel_yaml.CParser._compose_mapping_node
File “_ruamel_yaml.pyx”, line 904, in _ruamel_yaml.CParser._parse_next_event
ruamel.yaml.parser.ParserError: while parsing a block mapping
in “”, line 3, column 1
did not find expected key
in “”, line 8, column 1

I don’t understand what is wrong. Are you sure the followin line is working?

modeling_tf_camembert.TF_CAMEMBERT_PRETRAINED_MODEL_ARCHIVE_MAP[“camembert-base”]=…

Cause when I try it on Colab it doesn’t.

I’d love to use the french model for a bot, could you help me? Thx

kasar3 · September 3, 2020, 9:08am

Solved !!!

Finally I managed to load the model following your instructions.

To be more precise:

The first step is working well
in the registry.py, there is a sneaky TFCamembertModel, that I didn’t see and needed to add it
Indentation of the config file must be perfect (tricky error…) :

language: fr  
pipeline: 
    - name: HFTransformersNLP  
      model_name: "camembert"  
      model_weights: "camembert-base"  
      cache_dir: "data/cache_membert"
    - name: "LanguageModelTokenizer"  
    - name: "LanguageModelFeaturizer"  
    - name: "DIETClassifier"  
      epochs: 200

Hope it will help and thanks again to @Zoukero

MaxGdr · September 10, 2020, 7:26am

Thanks to share that u success to use it !

AminaDerouiche · May 17, 2021, 3:03pm

I want to use camemBERT but I have some issue, I did apply this solution unfortunately it is out dated I guess did anyone used camemBERT recently ?

zack · May 19, 2021, 10:59am

downgrade Transformers library, I guess it should be some version around 2.5.X

AminaDerouiche · May 20, 2021, 7:41am

I downgraded the transformers library and it worked

I did this pip install --upgrade transformers==2.5.0

thanks @zack for you answer

khatba · June 8, 2021, 9:47am

It worked ! but additionally to the steps described above I had to add camembert to the dict MAX_SEQUENCE_LENGTHS in rasa/nlu/utils/hugging_face.py

MAX_SEQUENCE_LENGTHS = { “camembert”: 512, “bert”: 512, “gpt”: 512, “gpt2”: 512, “xlnet”: NO_LENGTH_RESTRICTION, “distilbert”: 512, “roberta”: 512, }

Topic		Replies	Views
Use recent release google bert -MuRIL Rasa Open Source	6	890	January 8, 2021
Support for Language Models inside Rasa Release Announcements community , rasa	25	12758	November 25, 2021
Unable to download huggingface model Feedback on Rasa Open Source rasa-nlu-trainer	2	3011	March 9, 2023
Build a french chatbot using rasa Getting Started with Rasa	9	250	June 23, 2020
Foreign languages pipeline Rasa Open Source	1	476	January 25, 2021

RASA and camemBERT

Related topics