Configure Rasa pipeline for Thai Language

vss · April 18, 2024, 8:58am

Hello Rasa Community,

I’m currently developing a chatbot in Thai language and facing an issue with tokenizing Thai text. As Thai language can’t be tokenized by whitespace, the bot is struggling to recognize the intents and slots provided by the user.

I’m looking for advice on how to modify my pipeline to handle Thai language more effectively. I’ve read about using spaCy for Thai tokenization, but couldn’t make it work at my end. Here are a few specific questions I have:

Using spaCy for Thai Tokenization: How can I use spaCy’s Thai tokenizer in my pipeline? What are the necessary changes in the pipeline configuration?
If not by using spaCy, then what else can I use for tokenization?

Topic		Replies	Views
How to configure the pipeline using other language? Rasa Open Source	1	1734	September 30, 2019
Tokenizer for language without space such as Japanese Rasa Open Source	7	1979	October 20, 2020
Rasa is also good for languages other than English? Rasa Open Source	2	1412	September 19, 2019
What are the challenges for developing a chatbot with rasa for Bangla language? which level of customization it will require? Rasa Open Source	1	1153	January 23, 2020
Spacy alpha tokenization language support Getting Started with Rasa	1	137	January 18, 2019

Configure Rasa pipeline for Thai Language

Related topics