Adding nlu_fallback as an intent for rasa test nlu?

Hi! I’m currently working on a rasa utility chatbot and I want to add nlu_fallback as an intent to the confusion matrix.

I am not using the train_test_split method, instead, I have manually curated validation data (from real life examples), and some of them are intentionally to test the fallback system of the model. The reason I’m not putting the real-life data into my training set is because i found that it actually messes with my predictions.

I’ve looked up the following page:

But couldn’t find anything related to what I need. Could anyone help out?

Thank you guys in advance!

We’ve actually explicitly excluded the nlu_fallback intent from the confusion matrix to uncover the actual intent which the message was confused with.

When you want to test the fallback behavior of the model, you should write conversation tests.

The reason I’m not putting the real-life data into my training set is because i found that it actually messes with my predictions.

This sounds very dangerous. How does it mess with your predictions? Did you add enough training data for each intent? Synthetic training data might help with stable tests but you just move all the prediction error to your users then.

The training data I use are mostly short phrases, which is the expected input for users, as I’m working on some sort of a traditional + NLU hybrid discord bot for my FInal Year Project. Usually, phrases like “iq” or “roast me” are used, but the occasional “what is the weather in London” is also used. Rasa seems to confuse common phrases such as “is”, “are”, and “what”, as indicative of a particular intent.

On the other hand, I realized that by putting only key phrases that are often mentioned into the training data, eg. “weather in Berlin”, “animal pictures”, and “hello”, the NLU model actually performs better!

As I’m not using the full rasa open source or Rasa X, but only NLU (i personally find it more flexible to customize each intent does as well as add non NLU functions, since custom actions feel a bit more complicated), I’m essentially manually adding data to the training set. This also makes conversation tests unsuitable for me. I could spend a bunch of time writing up training data that have common words like “are” or “is” for the occasional weird input choice, or I could just use key phrases, which performs significantly better, at least in my early testing experience.