The training data I use are mostly short phrases, which is the expected input for users, as I’m working on some sort of a traditional + NLU hybrid discord bot for my FInal Year Project. Usually, phrases like “iq” or “roast me” are used, but the occasional “what is the weather in London” is also used. Rasa seems to confuse common phrases such as “is”, “are”, and “what”, as indicative of a particular intent.
On the other hand, I realized that by putting only key phrases that are often mentioned into the training data, eg. “weather in Berlin”, “animal pictures”, and “hello”, the NLU model actually performs better!
As I’m not using the full rasa open source or Rasa X, but only NLU (i personally find it more flexible to customize each intent does as well as add non NLU functions, since custom actions feel a bit more complicated), I’m essentially manually adding data to the training set. This also makes conversation tests unsuitable for me. I could spend a bunch of time writing up training data that have common words like “are” or “is” for the occasional weird input choice, or I could just use key phrases, which performs significantly better, at least in my early testing experience.