Response_selector's accuracy very low

vitalyuf · April 29, 2020, 7:22am

I switched my RASA from version 1.5.1 to 1.9.6.

I have custom featurizer based on fasttext in my pipeline. I modified it so DIET can be trained using custom features. It shows high scores (i_acc, e_f1) while training.

But Response selector’s r_acc is random and very low (randomly varies from approximately 0.001 to 0.1) and it doesn’t grow while training. What is the reason?

dakshvar22 · April 29, 2020, 12:17pm

Hi @vitalyuf Can you please share the pipeline configuration that you use with us?

Thanks

vitalyuf · April 30, 2020, 2:13am

Hi, @dakshvar22!

Yes, the pipline is:

pipeline: 
- name: "WhitespaceTokenizer"
- name: "yvi_imports.features.FTDenseFeaturizer"
- name: DIETClassifier 
   epochs: 100
- name: "ResponseSelector" 
   epochs: 1000

Also the code of FTDenseFeaturizer.train is:

...
from numpy import array
...
    def train(self, training_data, cfg, **kwargs):

        for example in training_data.training_examples:
            if 'response_tokens' in example.as_dict_nlu().keys():
                ft_vector = array(self._ft_embedder([tok.text for tok in example.as_dict_nlu()['response_tokens'][:-1]]+[' '.join([tok.text for tok in example.as_dict_nlu()['response_tokens']])], mean=True))
            else:
                ft_vector = [None]
            feats = self._combine_with_existing_dense_features(example, ft_vector)
            example.set("response_dense_features", feats)

        ft_vectors = [array(self._ft_embedder([tok.text for tok in ex.as_dict_nlu()['tokens'][:-1]]+[ex.as_dict_nlu()['text']], mean=True)) for ex in training_data.training_examples]
        for example, vector in zip(training_data.training_examples, ft_vectors):
            feats = self._combine_with_existing_dense_features(example, vector)
            example.set("text_dense_features", feats)

dakshvar22 · April 30, 2020, 5:17am

This line [tok.text for tok in example.as_dict_nlu()['response_tokens'][:-1]]+[' '.join([tok.text for tok in example.as_dict_nlu()['response_tokens']])] would transform a response like I am feeling better into ['I', 'am', 'feeling', 'better', 'I am feeling better']. I assume mean=True would include the embedding of the whole sentence as well(the last element of the list) in taking the mean. That can be problematic.

If you share the code for featurizer that you used in version 1.5.1 I can help you transform it for 1.9.6

Also, try using scale_loss=False in the configuration of response selector and see if that helps.

vitalyuf · April 30, 2020, 6:41am

@dakshvar22, thank you very much!

The error was in my code.

My embedder _ft_embedder accepts list of tokens lists and returns an embedding vector for each list. And for ['I', 'am', 'feeling', 'better', 'I am feeling better'] it should return a separate embedding vector for every element of this list, treating each element as list of tokens. But all the elements are strings. I missed converting them to lists of tokens and they were treated as lists of chars. Stupid mistake.

Now responseselector train shows r_acc=0.845 and it steady grows according to current epoch number.

dakshvar22 · April 30, 2020, 6:56am

Out of curiosity, are you using a sentence embedding model or a word embedding model? In case you are using a word embedding model, the list to pass to _ft_embedder should be [['I', 'am', 'feeling', 'better']] ? Taking a mean over them and setting that as the feature vector seems appropriate then.

vitalyuf · April 30, 2020, 7:20am

Yes, _ft_embedder is implementation of a word embedding model.

I used it to speed up debugging because it is faster then my initially used embedders Elmo and BERT (on RASA 1.5.1). After I fugure out how to prepare features for DIET and for ResponseSelector I will embed Elmo or BERT into my featurizer.

@dakshvar22, am I right that featurizer should provide for DIET and ResponseSelector lists of vectors (text_features and response_features), and every vector in a list should match a token, provided by a tokenizer component, except last one - which should be a vector matching the whole text?

Namely if I need to prepare features for 'I am feeling better' utterance should I set text features to [vector_for(['I']), vector_for(['am']), vector_for(['feeling']), vector_for(['better']), vector_for(['I', 'am', 'feeling', 'better'])]?

dakshvar22 · April 30, 2020, 7:51am

@vitalyuf That is very accurate

Do try scale_loss=False for response selector and see if it helps.

vitalyuf · April 30, 2020, 8:19am

ok, the results are the following

config:

- name: "WhitespaceTokenizer"
- name: "yvi_imports.features.FTDenseFeaturizer"
- name: DIETClassifier 
    epochs: 20
- name: "ResponseSelector" 
    epochs: 2000

train output:

2020-04-30 14:55:06 INFO rasa.nlu.selectors.response_selector - Retrieval intent parameter was left to its default value. This response selector will be trained on training examples combining all retrieval intents. Epochs: 100%|█████| 2000/2000 [02:09<00:00, 16.15it/s, t_loss=2.720, r_loss=1.265, r_acc=0.988]

config:

- name: "WhitespaceTokenizer"
- name: "yvi_imports.features.FTDenseFeaturizer"
- name: DIETClassifier 
    epochs: 20
- name: "ResponseSelector" 
    scale_loss: False
    epochs: 2000

train output:

2020-04-30 15:01:01 INFO rasa.nlu.selectors.response_selector - Retrieval intent parameter was left to its default value. This response selector will be trained on training examples combining all retrieval intents. Epochs: 100%|█████| 2000/2000 [02:09<00:00, 15.92it/s, t_loss=2.429, r_loss=1.730, r_acc=0.575]

vitalyuf · April 30, 2020, 9:02am

But if I combine in one pipline several featurizers (RegexFeaturizer,LexicalSyntacticFeaturizer,CountVectorsFeaturizer, "yvi_imports.features.FTDenseFeaturizer") it looks like adding scale_loss: False gives higher accuracy and lower loss (comparing to the same pipeline with several featurizers and without scale_loss: False).

Som · December 7, 2020, 2:55pm

i am also facing same. Diet classifier training accuracy is 0.98 but response selector accuracy is 0.048 for 100 epoch. Dataset size around 20k FAQs. Please guide me also. How can i increase r_accuracy?

petasis · March 2, 2021, 1:33pm

Me also…

evilc3 · May 24, 2021, 6:34am

@dakshvar22 I cant find the scale_loss argument in the docs using version 2.6. My problem is the opposite for me it shows accuracy as 1 but the loss is 6.5 or 5.9 in this range can you tell me why this is happening ?

dakshvar22 · June 8, 2021, 11:32am

@evilc3 Do you have a lot of examples which are very similar across different sub-intents of a retrieval intent? Training loss being high is an indication of that. Also, what’s the configuration pipeline that you are using?

makama-md · October 18, 2021, 11:27am

Hi @dakshvar22 I have the same issue. My config pipeline is: pipeline:

name: WhitespaceTokenizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 500
name: EntitySynonymMapper
name: ResponseSelector epochs: 500

Other components

name: ResponseSelector epochs: 500 retrieval_intent: faq

policies:

name: RulePolicy core_fallback_threshold: 0.4 core_fallback_action_name: “action_default_fallback” enable_fallback_prediction: True constrain_similarities: True

ChrisRahme · October 18, 2021, 12:30pm

Not sure if this specifically will solve your problem, @makama-md, but keep in mind that more epochs does not mean better accuracy.

You have 500 epochs but previous posts have 1000 or even 2000. This will cause overfitting.

Read about Tensorboard in Rasa to see how to optimize your pipeline and the effects of changing it. Keep in mind that the measurement you want to maximize is not training accuracy, but testing/validation accuracy.

makama-md · October 18, 2021, 1:20pm

thanks

stritchi · March 9, 2022, 1:37pm

I have the same case with a specific retrieval_intent, the other defines retrieval_intent have good accuracy. If I just train the retrieval_intent without training data from the other ones, it is trained well.

Topic		Replies	Views
ResponseSelector low accuracy Getting Started with Rasa	16	328	October 13, 2020
Response Selectors high loss but acc is 100 %! Contributing Code rasa-stories , rasa	0	709	May 23, 2021
1 of 9 retrieval_intent has a very low accuracy r_acc 0.045 while the others are fine Rasa Open Source	2	345	March 26, 2022
Rasa Response selector training issue Rasa Open Source	4	636	December 8, 2020
Responseselector accuracy is too low 0.048 Rasa Open Source	3	363	December 8, 2020

Response_selector's accuracy very low

Other components

Related topics