New Training Data Format Ideas

Just to be sure I understood correctly, are you saying that by having a YAML format you would be able to use standard YAML tooling to parse/process your data easily?

Correct, YAML is a standard format that we can load into Python for analysis, for example; you can still do that to a limited degree by using Rasa’s TrainingData module, but being able to load all story files into a single list of dictionaries for analysis without needing in-depth understanding of under-the-hood library functions would make that process much smoother. I think moving to YAML overall empowers chat bot developers to do story-level analysis of their bot to assure best performance, while still maintaining maximum readability.

1 Like

That sounds like a great idea to explore. Recently, we’ve been struggling with too many intents. We were trying to split the work among the DIET classifier and response selector to make it more like a hierarchical classification model. But recently learned that the response selector is not able to trigger any actions and works differently by training on the responses instead of the training data. I thought it was strange. We were wondering if we could also trigger actions instead of text responses from the response selector in an upcoming release.

YAML please

Yaml!

Thanks for asking. Full YAML seems to be the best option to me.

YAML is preferred.

+1 for option 2: YAML

The YAML approach is definitely cleaner and clearer

@degiz Option #2 is YAML looks better, in terms of both comprehending and writing. But, I have some suggestions.

@degiz. I think dumping more things into domain.yml and splitting it into multiple files makes it look messed up. We are declaring(mentioning) names of utterance responses under actions in domain.yml but again we are defining utterance responses under responses in domain.yml itself. I feel it will look more organized if we use domain.yml file for declaring/mentioning the names of utterances just like intents, entities, etc. and define them in a separate file.

I suggest we have a separate file(similar to NLU and stories) for responses for [utterance responses, ResponseSelector responses]. In future, even if have another category of responses we can have in responses file.

Since, adding buttons and image support to ResponseSelector responses is already mentioned, it would be great if support for Channel-Specific Responses is also provided along with it. Not to forget, training support for the same in Rasa X.

go with yaml 100%, it doesn’t make sense to have two different formats. \

parsing .md files is a huge hassle, yaml has standard import/export/syntax tree parsers, editor syntax highlighters etc.

its easy to convert yaml to csv if you need it in a spreadsheet to look at
it’s easy for other tools to generate stories in the YAML format.

Hi @degiz! I see that you wrote:

Does that mean that metadata will be available in the nlu pipeline? That would make me very happy! :slight_smile:

Hey @Johan

Does that mean that metadata will be available in the nlu pipeline? That would make me very happy! :slight_smile:

Currently the YAML parser simply ignores it, but we might change that in the next alpha/RC.

As currently, I still seem to be facing format related issues, I am curious about the yaml-support for the scenarios listed in the example nlu.yml. Has it already been addressed and released?