Hi! I’m currently working on a rasa utility chatbot and I want to add nlu_fallback as an intent to the confusion matrix.
I am not using the train_test_split method, instead, I have manually curated validation data (from real life examples), and some of them are intentionally to test the fallback system of the model. The reason I’m not putting the real-life data into my training set is because i found that it actually messes with my predictions.
I’ve looked up the following page:
But couldn’t find anything related to what I need. Could anyone help out?
We’ve actually explicitly excluded the nlu_fallback intent from the confusion matrix to uncover the actual intent which the message was confused with.
When you want to test the fallback behavior of the model, you should write conversation tests.
The reason I’m not putting the real-life data into my training set is because i found that it actually messes with my predictions.
This sounds very dangerous. How does it mess with your predictions? Did you add enough training data for each intent? Synthetic training data might help with stable tests but you just move all the prediction error to your users then.
The training data I use are mostly short phrases, which is the expected input for users, as I’m working on some sort of a traditional + NLU hybrid discord bot for my FInal Year Project. Usually, phrases like “iq” or “roast me” are used, but the occasional “what is the weather in London” is also used. Rasa seems to confuse common phrases such as “is”, “are”, and “what”, as indicative of a particular intent.
On the other hand, I realized that by putting only key phrases that are often mentioned into the training data, eg. “weather in Berlin”, “animal pictures”, and “hello”, the NLU model actually performs better!
As I’m not using the full rasa open source or Rasa X, but only NLU (i personally find it more flexible to customize each intent does as well as add non NLU functions, since custom actions feel a bit more complicated), I’m essentially manually adding data to the training set. This also makes conversation tests unsuitable for me. I could spend a bunch of time writing up training data that have common words like “are” or “is” for the occasional weird input choice, or I could just use key phrases, which performs significantly better, at least in my early testing experience.
Mhm, this means you’re basically removing stop words manually from your training data?
How many examples do you have per intent? I’m worried that your approach only works with you as user but won’t generalize to others using your assistant.
Not necessarily just stop words if my understanding of stop words are correct, I also remove words that may show up in other context too, such as a greeting inside of a command (eg. from “Hey bot, can you do something something” to “do something something”). I find that this is very helpful with tuning fallback thresholds.
As for the generalizing part, I totally understand your concern! However, as the bot i’m developing is a discord bot, which people may usually talk to in a different way compared to bots you’d see in a banking website, I find that this approach seems to yield the best results! I’ve had some experienced discord users, new users, and even some people from the rasa forum help test it out, and this seems to work well for all of them.
I have around 5 to 20-ish examples for each intent, depending on the needs. I suppose in a way, I’m using rasa NLU as an all-in-one fuzzy-matcher/entity-extractor/intent-classifier, since i find that it copes really well with what I am trying to do