How to configure the pipeline using other language?

I have a few questions about how to configure the pipeline for my chatbot. Since Thai language cannot be tokenized by whitespace, I would like to use Thai tokenizer in spacy.

  1. If I would like to use spacy for Thai tokenization, how can I call it in the pipeline and how I send tokenized to my own intent classification?
  2. If I create my own intent classification, how can I use it in the pipeline?

Moreover, I have read an example for customizing components from this blog. [Enhancing Rasa NLU models with Custom Components] I don’t understand well about how custom pipeline works e.g. what is the input and the output each compenent required for? How I send my text to component? Could you please suggest how custom pipeline works?

Thank you so much.

Welcome @Asw!

If you want to use the Thai tokenization from spacy, you need to register your language model and link it to the language identifier, which will allow Rasa to load and use your new language by passing in your language identifier as the language option in the config.yml file (see Language Support). Additionally, you need to list the SpacyTokenizer in the pipeline in the config.yml file.

If you want to use a custom intent classification model, your custom component need to implement the interface described in Custom NLU Components. Let’s assume your custom component can be found under custom_components.ThaiIntentClassifier. You need to add it to your pipeline in the config.yml file. E.g.:

- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "custom_components.ThaiIntentClassifier"

Let me know if you have any more questions.