After i already test my code with splot task and it’s work well with English. Now, i change my data to Thai language, train and test. But, the default setting of pipeline is not provide accurate result anymore. I think it cause of tokenization. Therefore, i try to change tokenization method in pipeline as follows:
language: th
pipeline:
- name: “SpacyNLP”
model: “xx_ent_wiki_sm”
- name: “SpacyTokenizer”
Spacy lib and spacy model “xx_ent_wiki_sm” are both installed. But it still inaccurate. I have three questions:
- The following settings is correct or incorrect ?
- Are there any example for custom tokenization, featurizer and classifier with own custom .py?
- What is the default settings[tokenization, featurizer, classifier model] of pipeline? [in case you input nothing]
Thank for your replying