Configure Rasa pipeline for Thai Language

Hello Rasa Community,

I’m currently developing a chatbot in Thai language and facing an issue with tokenizing Thai text. As Thai language can’t be tokenized by whitespace, the bot is struggling to recognize the intents and slots provided by the user.

I’m looking for advice on how to modify my pipeline to handle Thai language more effectively. I’ve read about using spaCy for Thai tokenization, but couldn’t make it work at my end. Here are a few specific questions I have:

  • Using spaCy for Thai Tokenization: How can I use spaCy’s Thai tokenizer in my pipeline? What are the necessary changes in the pipeline configuration?

  • If not by using spaCy, then what else can I use for tokenization?