Using pretrain model in NLU HFTransformersNLP "vinai/phobert-base" Error Tokenizer

Hi all, i have used HFTransformersNLP for train NLU, but with init model_weights is “vinai/phobert-base” and WhitespaceTokenizer ===> tokenizer.encode return NoneType alway . this line error lib\site-packages\transformers\tokenization_utils_base.py", line 2654

@tacsenlp can you please share the config.yml file for the ref? and what command you are using when you see this error message?

@tacsenlp can you please format the config.yml file thanks.

image

@tacsenlp Right!

Alert: The HFTransformersNLP is deprecated and will be removed in 3.0. The LanguageModelFeaturizer now implements its behavior.

Solution:

To use HFTransformersNLP component, install Rasa Open Source with pip3 install rasa[transformers].

Or if you are aware of this, please share screenshot of error message.

I guess this will solve your issue bro! Good Luck!

i am using rasa 2.8.7

@tacsenlp you can even try this

model_weights: "rasa/LaBSE"

i want use for “Vietnamese” language , LaBSE not comfortable

@tacsenlp ohh, you not mentioned that, ok let me see the solution for you.

thanks you !!

@tacsenlp meanwhile check this paper: (PDF) Enhancing Rasa NLU model for Vietnamese chatbot focus on the pipeline mentioned.

@tacsenlp Even check this of my solution thread: Rasa Train Error Function call stack: train_on_batch - #15 by trinhminhhieu

@tacsenlp Even check this: [ASK] Process get killed when training RASA core - #9 by nik202

I hope this will solve your issue.

2 Likes

@nik202 bert-base-multilingual-cased is support , but phobert-base is best for vi language… thanks you so much!!!

@tacsenlp Right, good to know :slight_smile: please can I request to close this thread as a solution for other Vietnamese user and for your reference and good luck!

1 Like

@nik202 i have problem. i don’t know debug trainer NLU, example: i want see vector before train, how to do this…

@tacsenlp can I ask why you want to see that and what is the significance for the same?

@nik202 this is as example of me… debug tokenizer, feature,… i want check …my custom compnents is right

@nik202 and… when i response by utter in domain.yml is fast(200-300ms)… but response by custom action is slow (2-3s) although only simple message

@tacsenlp I not get you still? you want to see the data of conversation or what? me confused with vector (it for me means dense vector )

@tacsenlp it natural phenomenon. What is your frontend or you just using rasa shell --debug

@nik202 i just using rasa shell --debug

@tacsenlp I did not get what you are looking for apologies, if you have something to share please share.