ResponseSelector low accuracy

petasis · May 24, 2020, 5:38pm

Hi all,

I am facing problems with my response selector. Its accuracy is very low (0.02), and of course it gives a lot of false answers. My pipeline is:

language: el pipeline:

name: HFTransformersNLP model_name: “bert” model_weights: “bert-base-multilingual-cased” cache_dir: packages/langdata
name: LanguageModelTokenizer
name: LanguageModelFeaturizer
name: RegexFeaturizer
name: LexicalSyntacticFeaturizer
name: CountVectorsFeaturizer
name: CountVectorsFeaturizer analyzer: “char_wb” min_ngram: 1 max_ngram: 4
name: DIETClassifier epochs: 100
name: EntitySynonymMapper
name: ResponseSelector epochs: 100

I am using it only for a single intent (chitchat) but I have many sentence pairs (> 4000). I was looking in the documentation that it is similar to DIETClassifier, but I could’t figure what it does, and whether the default values of the network ( hidden_layers_sizes, embedding_dimension, etc.) are enough for my case…

Tanja · June 4, 2020, 7:46am

@petasis What version of Rasa are you using?

petasis · June 23, 2020, 11:08am

I am using 1.10.1

Tanja · June 24, 2020, 7:54am

How is the performance of the DIETClassifier? It might be that the language model you are using is not the best. Did you tried any other language model or even training without any pre-trained language model? Also it is quite hard to say what is going wrong without looking at the data. Could you maybe share an excerpt of your data?

petasis · September 8, 2020, 10:05am

Hi again. An example:

My nlu.md file has:

## intent: chitchat/greetings0en02
- Good Morning
- Good Morning.

My responses.md has:

## greetings0en02
* chitchat/greetings0en02
    - Good day.
    - Good Morning.

I am using bert multilingual embeddings (“bert-base-multilingual-cased”). In rasa shell, I type “good morning”.

2020-09-08 12:26:57 DEBUG    rasa.core.policies.mapping_policy  - The predicted intent 'chitchat' is mapped to  action 'respond_chitchat' in the domain.
2020-09-08 12:26:57 DEBUG    rasa.core.policies.form_policy  - There is no active form
2020-09-08 12:26:57 DEBUG    rasa.core.policies.fallback  - NLU confidence threshold met, confidence of fallback action set to core threshold (0.3).
2020-09-08 12:26:57 DEBUG    rasa.core.policies.ensemble  - Predicted next action using policy_2_MappingPolicy
2020-09-08 12:26:57 DEBUG    rasa.core.processor  - Predicted next action 'respond_chitchat' with confidence 1.00.
2020-09-08 12:26:57 DEBUG    rasa.core.actions.action  - Picking response from selector of type default
2020-09-08 12:26:57 DEBUG    rasa.core.processor  - Action 'respond_chitchat' ended with events '[BotUttered('Thank you!', {"elements": null, "quick_replies": null, "buttons": null, "attachment": null, "image": null, "custom": null}, {}, 1599557217.3225248)]'

The intent is correct, “chitchat”. I am not sure if the “sub-intent” is correct. But the response selector returns “thank you”.

Why isn’t selected one of “Good day.” or “Good Morning.”? What follows ‘/’ in intent name is ignored?

dakshvar22 · September 28, 2020, 3:13pm

@petasis Thanks for providing some examples. Can you try the following config -

language: el
pipeline:
  - name: HFTransformersNLP
    model_name: "bert"
    model_weights: "bert-base-multilingual-cased"
    cache_dir: packages/langdata
  - name: LanguageModelTokenizer
  - name: LanguageModelFeaturizer
    alias: "lmf"
  - name: RegexFeaturizer
    alias: "rf"
  - name: LexicalSyntacticFeaturizer
    alias: "lsf"
  - name: CountVectorsFeaturizer
    alias: "cvf_w"
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
    alias: "cvf_c"
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    featurizers: ["cvf_w", "lmf"]

This basically excludes features from RegexFeaturizer, LexicalSyntacticFeaturizer and char level CountVectorsFeaturizer from being used for the response selector.

petasis · September 28, 2020, 3:42pm

@dakshvar22 Thank you for answering this. To say the truth, in the meantime I wrote my own response selector, which is much much simpler. I also started a new post: I am trying to create a new response selector: How to prepare features for the class?

What I did is simple: I map input into an intent (such as chitchat/action1), and then I select a response from all available responses in the responses.md file.

This works better than the rasa’s response selector (much fewer errors). although it still has some mis-classifications. And it solves the problem of not responding all responses.

I think in my case, where chitchat is huge (~23.000 questions) and almost as many answers, embedding both answers & questions in the same space and selecting through similarities, is not a good strategy.

dakshvar22 · October 5, 2020, 9:17am

@petasis Thanks for that feedback. We have actually made that possible now with the latest upcoming release of Rasa Open source 2.0. Expect it to be out sometime this week or early next week.

petasis · October 5, 2020, 9:35am

Yes, I saw it. I will try it, probably tomorrow.

petasis · October 5, 2020, 2:21pm

@dakshvar22 I have tested the new response selector, but unfortunately it fails:

 2020-10-05 16:12:42 DEBUG    rasa.nlu.selectors.response_selector  - Following metrics will be logged during training: 
2020-10-05 16:12:42 DEBUG    rasa.nlu.selectors.response_selector  -   t_loss (total loss)
2020-10-05 16:12:42 DEBUG    rasa.nlu.selectors.response_selector  -   r_acc (response acc)
2020-10-05 16:12:42 DEBUG    rasa.nlu.selectors.response_selector  -   r_loss (response loss)
2020-10-05 16:12:42 DEBUG    rasa.utils.tensorflow.models  - Building tensorflow train graph...
2020-10-05 16:13:09 DEBUG    rasa.utils.tensorflow.models  - Finished building tensorflow train graph.
Epochs: 100%|=========| 50/50 [23:17<00:00, 27.96s/it, t_loss=4.390, r_loss=4.390, r_acc=0.048]

Accuracy is extremely low. My pipeline is:

pipeline:
  - name: packages.LanguageDetection.LanguageDetection
  - name: HFTransformersNLP
    # Name of the language model to use
    model_name: "bert"
    # Pre-Trained weights to be loaded
    #model_weights: "nlpaueb/bert-base-greek-uncased-v1"
    model_weights: "bert-base-multilingual-uncased"
    cache_dir: packages/langdata
    alias: "embeddings"
  - name: LanguageModelTokenizer
    # Flag to check whether to split intents
    intent_tokenization_flag: False
    # Symbol on which intent should be split
    intent_split_symbol: "_"
  - name: LanguageModelFeaturizer
    alias: "lmf"
  - name: RegexFeaturizer
    # Text will be processed with case sensitive as default
    case_sensitive: True
    alias: "rf"
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
    use_lemma: False
    # Set the out-of-vocabulary token
    OOV_token: "_oov_"
    # Whether to use a shared vocab
    use_shared_vocab: False
  - name: RegexEntityExtractor
  - name: DIETClassifier
    epochs: 50
    random_seed: 20212020
  - name: EntitySynonymMapper
#  - name: packages.ResponseSelectorThroughIntent.ResponseSelectorThroughIntent
#    epochs: 50
#    random_seed: 20212020
  - name: ResponseSelector
    epochs: 50
    random_seed: 20212020
    featurizers: ["lmf"]
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.1

dakshvar22 · October 5, 2020, 2:35pm

What version of Rasa did you try? Also, how many examples for response selector do you have?

petasis · October 5, 2020, 2:57pm

rasa --version
Rasa Version     : 2.0.0rc3
Rasa SDK Version : 2.0.0rc1
Rasa X Version   : None
Python Version   : 3.8.5 (default, Aug 12 2020, 00:00:00) 
Operating System : Linux-5.8.12-200.fc32.x86_64-x86_64-with-glibc2.2.5
Python Path      : /usr/bin/python3

petasis · October 5, 2020, 2:57pm

intent: flight_departure_info, training examples: 5953   
intent: flight_arrival_info, training examples: 203   
intent: inform, training examples: 2429   
intent: affirm, training examples: 32   
intent: deny, training examples: 19   
intent: stop, training examples: 58   
intent: search_encyclopedia, training examples: 32   
intent: search_weather, training examples: 33   
intent: chitchat_el, training examples: 21488   
intent: insult, training examples: 236   
intent: thank_you, training examples: 102   
intent: chitchat_en, training examples: 1713

petasis · October 5, 2020, 2:58pm

The response selector is concerned with chitchat* intents.

petasis · October 6, 2020, 9:35am

@dakshvar22 Is there a document describing how response selector works in rasa 2.0rc4? Because the results are very low, and at the same time a simple DIET classifier (that simply maps input to a class like chitchat/ask) is ~0.98. In reality it is much lower (i.e. when asking all questions, the final accuracy is 0.8), but still rasa’s response selector seems to not be able to handle the data. Perhaps a problem in the training format data?

petasis · October 13, 2020, 9:53am

Ok, for my case, my classifier seems to work better. So, I tried to simulate this in Rasa 2.0 Response Selector and results are promising:

  - name: ResponseSelector
    number_of_transformer_layers: 2
    transformer_size: 256
    scale_loss: False
    use_sparse_input_dropout: True
    use_dense_input_dropout: True
    hidden_layers_sizes:
      text:  []
      label: []
    featurizers: ["cvf_w"]
    use_text_as_label: False
    epochs: 100
    random_seed: 20212020

Switching to a transformer achieves accuracy=0.9742 when evaluated on the training data, which is ok…

dakshvar22 · October 13, 2020, 10:08am

Thanks for sharing your results. Good to know it works better with transformer layers. Do you have a test set as well on which you can evaluate?

Topic		Replies	Views
Response_selector's accuracy very low Rasa Open Source	17	1516	March 9, 2022
Is my model training correctly? Rasa Open Source	8	406	September 27, 2020
1 of 9 retrieval_intent has a very low accuracy r_acc 0.045 while the others are fine Rasa Open Source	2	345	March 26, 2022
Response Selectors high loss but acc is 100 %! Contributing Code rasa-stories , rasa	0	712	May 23, 2021
Rasa NLU intent recognition models in Portuguese Rasa Open Source	2	274	July 18, 2023

ResponseSelector low accuracy

Related topics