New Training Data Format Ideas

niveK · June 3, 2020, 6:46pm

Just to be sure I understood correctly, are you saying that by having a YAML format you would be able to use standard YAML tooling to parse/process your data easily?

Correct, YAML is a standard format that we can load into Python for analysis, for example; you can still do that to a limited degree by using Rasa’s TrainingData module, but being able to load all story files into a single list of dictionaries for analysis without needing in-depth understanding of under-the-hood library functions would make that process much smoother. I think moving to YAML overall empowers chat bot developers to do story-level analysis of their bot to assure best performance, while still maintaining maximum readability.

_isu · June 4, 2020, 2:54am

That sounds like a great idea to explore. Recently, we’ve been struggling with too many intents. We were trying to split the work among the DIET classifier and response selector to make it more like a hierarchical classification model. But recently learned that the response selector is not able to trigger any actions and works differently by training on the responses instead of the training data. I thought it was strange. We were wondering if we could also trigger actions instead of text responses from the response selector in an upcoming release.

jeroeningelbrecht · June 5, 2020, 6:21am

YAML please

Michael · June 5, 2020, 8:54am

Yaml!

Hans · June 5, 2020, 2:25pm

Thanks for asking. Full YAML seems to be the best option to me.

lkrishnaprasad · June 6, 2020, 5:47am

YAML is preferred.

tatianaf · June 8, 2020, 3:11pm

+1 for option 2: YAML

kitlun · June 8, 2020, 11:58pm

The YAML approach is definitely cleaner and clearer

Akhil · June 22, 2020, 6:25pm

@degiz Option #2 is YAML looks better, in terms of both comprehending and writing. But, I have some suggestions.

@degiz. I think dumping more things into domain.yml and splitting it into multiple files makes it look messed up. We are declaring(mentioning) names of utterance responses under actions in domain.yml but again we are defining utterance responses under responses in domain.yml itself. I feel it will look more organized if we use domain.yml file for declaring/mentioning the names of utterances just like intents, entities, etc. and define them in a separate file.

I suggest we have a separate file(similar to NLU and stories) for responses for [utterance responses, ResponseSelector responses]. In future, even if have another category of responses we can have in responses file.

Since, adding buttons and image support to ResponseSelector responses is already mentioned, it would be great if support for Channel-Specific Responses is also provided along with it. Not to forget, training support for the same in Rasa X.

dcsan · June 23, 2020, 8:25pm

go with yaml 100%, it doesn’t make sense to have two different formats. \

parsing .md files is a huge hassle, yaml has standard import/export/syntax tree parsers, editor syntax highlighters etc.

its easy to convert yaml to csv if you need it in a spreadsheet to look at
it’s easy for other tools to generate stories in the YAML format.

cajoek · June 26, 2020, 8:03am

Hi @degiz! I see that you wrote:

Does that mean that metadata will be available in the nlu pipeline? That would make me very happy!

degiz · August 18, 2020, 9:44am

Hey @Johan

Does that mean that metadata will be available in the nlu pipeline? That would make me very happy!

Currently the YAML parser simply ignores it, but we might change that in the next alpha/RC.

scordee · June 22, 2021, 7:48am

As currently, I still seem to be facing format related issues, I am curious about the yaml-support for the scenarios listed in the example nlu.yml. Has it already been addressed and released?

Topic		Replies	Views
Rasa NLU training data - JSON or markdown? Rasa Open Source	4	3405	July 25, 2019
Tool for training data in Markdown format? Rasa Open Source	1	773	December 21, 2018
Create training data in markdown format Rasa Open Source	0	522	December 5, 2018
Yaml training data not appearing in rasa-x [Deprecated] Rasa X Community Edition	2	340	November 19, 2020
Questions when training a bot freshly migrated to 2.x Rasa Open Source	0	206	April 12, 2021

New Training Data Format Ideas

Related topics