Tools helping creating good datasets

datistiquo · September 27, 2018, 2:22pm

How do you create in practice well balanced data sets for intent and entity classification? Do you offer some tools for that in Rasa Platform @akelad? Because a tool for just adding data is really not enough. You have to take care of balancing different sentence types for each entity and intent classification. I have several thousand of example and struggle to find the proper ones and deleting similiar sentences. Would it be a good strategy for practice to have a tool where you have a look over your structures for sentences? I imagine a simple bag of word topic clustering to cluster my examples and choose the right ones for the tensorlfow embedding. For entities I think a simple statistics over context words would be enough. So you can see how many example you have of each type of context for entities and you can avoiding overfitting. I think that would be a good approach?

Topic		Replies	Views
Advices for creating a data set Rasa Open Source	8	1122	September 27, 2018
Analyse intent / entity distribution Rasa Open Source	6	301	November 10, 2021
Rasa NLU in Depth - Part 1: Intent Classification Tutorials, Resources & Videos	0	2689	February 21, 2019
How can I make the training dataset from over 400 question and answer? Rasa Open Source	1	588	August 24, 2018
Rasa Platform - Data analyzing tool? Rasa Open Source	1	764	October 10, 2018

Tools helping creating good datasets

Related topics