Proper number of training examples with similar/close intents?

UnbridledGames · March 7, 2022, 5:40pm

I’m working on a chat bot that delivers information from a custom knowledgebase. The basics of how it works is the user asks about a (topic) and depending on the classified intent, an answer is pulled from a database. The conversation can branch out, where the info is traversed down a branching tree. The intents are general queries like where_at_request, when_request, did_it_happen_request, who_did_it_request, things like that.

Because they are like that, some of the questions could be pretty close to others.

In a case like that, I have a few options for the intent examples for training:

1: Use a few very distinct examples for each question. This seems bad, regardless. 2: Write a bunch of questions and duplicate some, changing some less important words. Examples: Where is he? Where is she? Where are they? Where is it? Where did it go? Where did he go? Do you know where they are? Any idea where she is? Any clue where it’s at? Can you tell me where she went?

In this case I’m not sure which are the best words to focus on changing, and which words are irrelevant to the training data. For example, if all I’m trying to figure out for intent is “user is asking where” are the variations of he/she/they/it and are/is/went/left it important? How important?

3: I’ve been working on a program where I can supply lists of words in a yaml file. For an intent of did_it_happen_request I might have:

` did_does:

did
have
does
has

pronouns:

he
she
it
they
you

action:

do
take
go
see

target:

it
there
here
us `

Then form a sentence from every combination of all 4 groups. There is some logic I didn’t bother to show that makes sure that questions like “have she take it” are not in the results.

However, for some questions with lots of potential targets, the number of resulting sentences can easily be 400+.

That feels like extreme overkill.

But, is THAT many training examples TOO many?

I haven’t populated the nlu.yml with anything like that yet, but I’ll probably run some side-by side tests. But I also wanted to ask here. In general Ive seen ‘the more the better’ but also I feel like 500 variations of “did the thing/event occur yet” might be excessive?

Topic		Replies	Views
Is it a problem, if i have more nlu examples Rasa Open Source	2	410	October 5, 2021
FAQ bot intent's training examples Rasa Open Source varsha	1	869	March 1, 2019
Let's discuss intent best practices Rasa Open Source	3	1255	September 2, 2023
A few questions from a newbie at Rasa Getting Started with Rasa	6	276	September 7, 2021
Improve Intent Classification Rasa Open Source	2	1114	June 9, 2023

Proper number of training examples with similar/close intents?

Related topics