Using pretrain model in NLU HFTransformersNLP "vinai/phobert-base" Error Tokenizer

tacsenlp · November 12, 2021, 4:23pm

Hi all, i have used HFTransformersNLP for train NLU, but with init model_weights is “vinai/phobert-base” and WhitespaceTokenizer ===> tokenizer.encode return NoneType alway . this line error lib\site-packages\transformers\tokenization_utils_base.py", line 2654

nik202 · November 12, 2021, 4:58pm

@tacsenlp can you please share the config.yml file for the ref? and what command you are using when you see this error message?

@tacsenlp can you please format the config.yml file thanks.

tacsenlp · November 12, 2021, 5:07pm

nik202 · November 12, 2021, 5:08pm

@tacsenlp Right!

Alert: The HFTransformersNLP is deprecated and will be removed in 3.0. The LanguageModelFeaturizer now implements its behavior.

Solution:

To use HFTransformersNLP component, install Rasa Open Source with pip3 install rasa[transformers].

Or if you are aware of this, please share screenshot of error message.

I guess this will solve your issue bro! Good Luck!

tacsenlp · November 12, 2021, 5:11pm

i am using rasa 2.8.7

nik202 · November 12, 2021, 5:11pm

@tacsenlp you can even try this

model_weights: "rasa/LaBSE"

tacsenlp · November 12, 2021, 5:14pm

i want use for “Vietnamese” language , LaBSE not comfortable

nik202 · November 12, 2021, 5:15pm

@tacsenlp ohh, you not mentioned that, ok let me see the solution for you.

tacsenlp · November 12, 2021, 5:16pm

thanks you !!

nik202 · November 12, 2021, 5:17pm

@tacsenlp meanwhile check this paper: (PDF) Enhancing Rasa NLU model for Vietnamese chatbot focus on the pipeline mentioned.

@tacsenlp Even check this of my solution thread: Rasa Train Error Function call stack: train_on_batch - #15 by trinhminhhieu

@tacsenlp Even check this: [ASK] Process get killed when training RASA core - #9 by nik202

I hope this will solve your issue.

tacsenlp · November 12, 2021, 5:25pm

@nik202 bert-base-multilingual-cased is support , but phobert-base is best for vi language… thanks you so much!!!

nik202 · November 12, 2021, 5:26pm

@tacsenlp Right, good to know please can I request to close this thread as a solution for other Vietnamese user and for your reference and good luck!

tacsenlp · November 12, 2021, 5:32pm

@nik202 i have problem. i don’t know debug trainer NLU, example: i want see vector before train, how to do this…

nik202 · November 12, 2021, 5:45pm

@tacsenlp can I ask why you want to see that and what is the significance for the same?

tacsenlp · November 12, 2021, 5:48pm

@nik202 this is as example of me… debug tokenizer, feature,… i want check …my custom compnents is right

tacsenlp · November 12, 2021, 5:54pm

@nik202 and… when i response by utter in domain.yml is fast(200-300ms)… but response by custom action is slow (2-3s) although only simple message

nik202 · November 12, 2021, 5:55pm

@tacsenlp I not get you still? you want to see the data of conversation or what? me confused with vector (it for me means dense vector )

nik202 · November 12, 2021, 5:58pm

@tacsenlp it natural phenomenon. What is your frontend or you just using rasa shell --debug

tacsenlp · November 12, 2021, 5:59pm

@nik202 i just using rasa shell --debug

nik202 · November 12, 2021, 7:13pm

@tacsenlp I did not get what you are looking for apologies, if you have something to share please share.

Topic		Replies	Views
How can i use another language model in HFTransformersNLP? Tutorials, Resources & Videos	1	1379	December 16, 2020
Using BERT with RASA Rasa Open Source	10	7094	September 9, 2020
Correct tokenizer for BERT Rasa/LaBSE Rasa Open Source	2	263	January 17, 2025
Support for Language Models inside Rasa Release Announcements community , rasa	25	12746	November 25, 2021
Training fails when using HFTransformersNLP Rasa X [Deprecated] Rasa X Community Edition	8	1294	November 30, 2020

Using pretrain model in NLU HFTransformersNLP "vinai/phobert-base" Error Tokenizer

Related topics