Adding a tokenizer to a predefined pipeline(for languages like Chinese)

thepurpleowl · May 9, 2019, 11:27am

I am trying to build a chineese chatbot with ‘tensorflow_embedding’ predefined pipeline. But as for chineese ‘tokenizer_whitespace’ won’t work, I want to try ‘tensorflow_embedding’ with ‘tokenizer_jieba’. So

Is there a way to use a specific component with a specific predefined pipeline?
If not where can I get the components used in ‘tensorflow_embedding’ pipeline, so that I can manually modify the config file with ‘tokenizer_jieba’?

erohmensing · May 21, 2019, 12:28pm

Check out the docs for the JiebaTokenizer here

Topic		Replies	Views
Tokenizer for language without space such as Japanese Rasa Open Source	7	1994	October 20, 2020
How to use Japanese Text with Rasa (Mecab-Tokenization) Rasa Open Source	3	1755	July 4, 2019
Tensorflow pipeline workflow Rasa Open Source	1	1350	February 25, 2019
How to configure the pipeline using other language? Rasa Open Source	1	1739	September 30, 2019
Incorporate a chinese tokenizer into rasa packages Rasa Open Source	2	545	October 30, 2018

Adding a tokenizer to a predefined pipeline(for languages like Chinese)

Related topics