Large number of entities per example & cross validation expectations

Hi! First i want to say how great rasa’s documentation is. I also love the deep dive articles you’ve put together.

I just want to verify that I’m on the right track (without flagging compliance issues) and figured I’d ask here since no one in my department has worked with rasa. Our examples are quite lengthy (1-2 paragraphs) with 8-12 different entities that need to be extracted for this project to work. I’ve been manually labeling (really want prodigy but settling for rasa-nlu-trainer) all our training and test data with about 230 examples in the training set. We are only focused on one use case right now which leads to all my examples being labeled as the same intent.

See the image below for training results. I have a couple questions


  1. I expected diminishing returns but its an bit discouraging. Is this level of diminishing returns expected from longer examples? Should I aim to double my examples to ~500? go for quads to ~1000 examples?

  2. Am I screwing myself by only going for just one intent right now? Lets say next month we add intent_2 and we have ~1000 (go big or go home :smiley: ) examples for intent_1. Would I likely have to add a similar number of examples for intent_2. My guess is that is it depends on how similar the intends are.

Any help would be greatly appreciated.