Hi, so i am trying the DIETclassifier model with help from this guide (https://towardsdatascience.com/how-do-chatbots-understand-87227f9f96a7) but i found that it doesnt work well with portuguese, i basically took the guide’s model, changed the language from en to pt and then inserted my own test and train data, but now i have very low accuracy (15-21% MAX) is there anything else that i would need to do to make my classifier models good for portuguese? I also dont think its the training data bc i copy-pasted the guide example data (which in English had close to 100% accuracy) , translated it to pt and only had a max of 21% of accuracy.
Hi so on my own dataset i have 4 intents and a total of 95 examples (ranging from 17-28 examples based on the intent), all intents are short phrases and none have ~ or ç (specific portuguese characters) and that dataset gives me only a 15% accuracy.
Then i translated a small english dataset (provided by the guide) which has 3 intents and 23 examples (4-11 examples on each intent) that smaller dataset gave me a 21% accuracy in PT but close to 100% in EN.
The only change besides the dataset i am making for portuguese is in the config files: language: pt
I heard i could use a spacy transformers to try to get better PT performance, something like this:
pipeline:
name: SpacyNLP # Use Spacy as it supports Portuguese.
model: “pt_core_news_sm” # Portuguese language model.
name: SpacyTokenizer # Spacy tokenizer to use the Spacy model’s tokenization.
name: SpacyFeaturizer # Featurizer that uses word vectors from Spacy.
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
or alternatively using pt_core_news_lg , but then that bypasses the guide python code which saves the model and i dont know exactly how that would work, how would i save this model to use later?
Since i am quite new to NLP would you mind helping me? maybe we can talk on telegram or discord