Large number of entities per example & cross validation expectations

daxaxelrod · April 5, 2019, 8:27pm

Hi! First i want to say how great rasa’s documentation is. I also love the deep dive articles you’ve put together.

I just want to verify that I’m on the right track (without flagging compliance issues) and figured I’d ask here since no one in my department has worked with rasa. Our examples are quite lengthy (1-2 paragraphs) with 8-12 different entities that need to be extracted for this project to work. I’ve been manually labeling (really want prodigy but settling for rasa-nlu-trainer) all our training and test data with about 230 examples in the training set. We are only focused on one use case right now which leads to all my examples being labeled as the same intent.

See the image below for training results. I have a couple questions

I expected diminishing returns but its an bit discouraging. Is this level of diminishing returns expected from longer examples? Should I aim to double my examples to ~500? go for quads to ~1000 examples?
Am I screwing myself by only going for just one intent right now? Lets say next month we add intent_2 and we have ~1000 (go big or go home ) examples for intent_1. Would I likely have to add a similar number of examples for intent_2. My guess is that is it depends on how similar the intends are.

Any help would be greatly appreciated.

Topic		Replies	Views
Is it a problem, if i have more nlu examples Rasa Open Source	2	410	October 5, 2021
Choosing NLU pipeline Rasa Open Source	6	1331	December 16, 2019
Training on large intent examples Rasa Open Source	5	1369	March 17, 2022
Lookup tables and entity training Rasa Open Source	3	5767	November 19, 2019
Rasa NLU NER Getting Started with Rasa	2	120	April 14, 2019

Large number of entities per example & cross validation expectations

Related topics