Rasa NLU intent recognition models in Portuguese

caue_1 · July 18, 2023, 3:05pm

Hi, so i am trying the DIETclassifier model with help from this guide (https://towardsdatascience.com/how-do-chatbots-understand-87227f9f96a7) but i found that it doesnt work well with portuguese, i basically took the guide’s model, changed the language from en to pt and then inserted my own test and train data, but now i have very low accuracy (15-21% MAX) is there anything else that i would need to do to make my classifier models good for portuguese? I also dont think its the training data bc i copy-pasted the guide example data (which in English had close to 100% accuracy) , translated it to pt and only had a max of 21% of accuracy.

heres my config file: pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
name: DIETClassifier entity_recognition: false intent_classification: true epochs: 100
name: classifier2.LRIntentClassifier max_iter: 100
name: EntitySynonymMapper
name: ResponseSelector epochs: 200 constrain_similarities: true entity_recognition: false
name: FallbackClassifier threshold: 0.7 ambiguity_threshold: 0.1

nonola · July 18, 2023, 6:36pm

Rasa is language agnostic. That said, how many examples do you have in each intent?

caue_1 · July 18, 2023, 10:29pm

Hi so on my own dataset i have 4 intents and a total of 95 examples (ranging from 17-28 examples based on the intent), all intents are short phrases and none have ~ or ç (specific portuguese characters) and that dataset gives me only a 15% accuracy.

Then i translated a small english dataset (provided by the guide) which has 3 intents and 23 examples (4-11 examples on each intent) that smaller dataset gave me a 21% accuracy in PT but close to 100% in EN.

The only change besides the dataset i am making for portuguese is in the config files: language: pt

I heard i could use a spacy transformers to try to get better PT performance, something like this:

pipeline:

name: SpacyNLP # Use Spacy as it supports Portuguese. model: “pt_core_news_sm” # Portuguese language model.
name: SpacyTokenizer # Spacy tokenizer to use the Spacy model’s tokenization.
name: SpacyFeaturizer # Featurizer that uses word vectors from Spacy.
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer

or alternatively using pt_core_news_lg , but then that bypasses the guide python code which saves the model and i dont know exactly how that would work, how would i save this model to use later?

Since i am quite new to NLP would you mind helping me? maybe we can talk on telegram or discord

Topic		Replies	Views
How to improve NLU accuracy? Rasa Open Source	4	2100	April 26, 2021
Improve Rasa NLU model Rasa Open Source	5	2149	October 15, 2019
Rasa NLU without Rasa Core Getting Started with Rasa confidence	4	188	August 23, 2019
What changed in the DIETClassifier implementation or defaults? Significant drop in performance for rare intents Rasa Open Source	6	616	June 30, 2021
Injecting pretrained sentence level semantic features to the DIETClassifier Rasa Open Source	0	459	December 31, 2021

Rasa NLU intent recognition models in Portuguese

Related topics