Retrieval intent returns with low confidence even testing with a sample in training data

Hi all,

I’m using Rasa to create a Chatbot to reply FAQ to customer by following this guide: Chitchat and FAQs

I’ve trained the model successfully with ~300 FAQs, but when testing NLU I always get the intent with very low confidence (10% → 20%) even when I try with a sample in the training data. Also, sometimes, it returns wrong intent for sample in the training data.

My system:

Rasa Version      :         3.6.19
Minimum Compatible Version: 3.5.0
Rasa SDK Version  :         3.6.2
Python Version    :         3.10.12
Operating System  :         Linux-5.15.133.1-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python Path       :         /opt/venv/bin/python

Files: config.yml (1.2 KB) domain.yml (1.9 KB) nlu.yml (338.1 KB)

Could you tell me what I need to do to increase the accurate? And, my FAQ set contains ~3000 FAQs, can Rasa handle well in my case?

Thank you for your help.

I would read about NLU testing and intent confusion in particular. There’s docs on generating a confusion matrix here.

There’s a blog post on testing here.

Thank you for your reply.

Here are files generated after following the doc.

DIETClassifier_report.json (2.0 KB)

DIETClassifier_errors.json (7.6 KB) intent_errors.json (170 Bytes) intent_report.json (580 Bytes) response_selection_errors.json (66.4 KB) response_selection_report.json (55.2 KB)

It seems that it confused a lots, but I don’t know how to improve the result. Could you please give me some advices?

Thank you for your help.

The response_selection_errors.json shows the problem. For example, the following shows that the two intents are easily confused based on the intent titles. I would expect that these two intents could be confused since they are so similar. You could

  • combine them and answer both questions in the response
  • try to separate them more clearly by providing clearer separation in the example utterances
  • switch to a RAG approach like Rasa Pro’s enterprise search.
  {
    "text": "chỉ định da của chị phải căng chỉ và tiêm botox mới cải thiện, làm nhiều dịch vụ như thế mặt chị có đơ không em?",
    "intent_response_key_target": "faq/ask_cang_da_bang_chi_mat_co_bi_do_khong_do_tuoi_de_cang_da_chi_la_bao_nhieu_",
    "intent_response_key_prediction": {
      "name": "faq/ask_cang_da_mat_bang_chi_co_gay_nguy_hiem_khong_",
      "confidence": 0.08167824149131775
    }
  },

Thank you for your advices.

I will rework on the NLU data to make it clearer and try again.

For the second question, my NLU data will contain ~3K FAQs like that in the end. Could Rasa handle well in this case?

With that many FAQ’s I would use a RAG approach.

Thank you again.

I will try RAG approach also after cleaning the current data to see if it’s getting better.