Rasa NLU training data - JSON or markdown?

Zoltan · August 1, 2018, 5:58pm

Hey all, I am new to Rasa, just finished reading through all the documentation and starting to build my first chatbot and making my architectural decisions now.

One thing I am wondering about is whether shall I go with JSON or markdown when creating the Rasa NLU training files. I know, that both of these support the option of having training data in multiple files (a must for large projects) and I know that markdown is more human-readable and simple, while JSON has a trainer (https://github.com/RasaHQ/rasa-nlu-trainer) which can help you to create the JSON files.

The question is what other pros and cons each format has? Is one more strategical for future enhancements than the other?

Thanks

souvikg10 · August 1, 2018, 6:41pm

It really depends on how your team is going to manage training data. We are a large organization with cross-functional team and hence we want to empower our business users to enrich training data so we prefer using an UI to do so. Hence our training data format is JSON. However some power users love Markdown as it is really easy to understand, so some points to consider

Size of your team
Who is going to enrich the data
Maintainability of the stack

Zoltan · August 1, 2018, 6:50pm

Thanks, yes, that makes sense. We will start out small, but later we will likely end up like you - with a larger cross-functional team enriching training data. As such it would be probably easier to start out with the Markdown, but later the JSON with its UI would be more beneficial.

I think we will go with the JSON now in anticipation of the future growth of the group who will maintain the training examples.

Thanks

capgos17 · July 25, 2019, 5:10am

Apart from the formatting is there any difference in functionalities between them? one more question, How can we create synonyms in JSON using UI? @souvikg10

JulianGerhard · July 25, 2019, 5:26am

Hi @capgos17

according to the docs, synonyms using JSON can be added by using the following syntax:

{
  "rasa_nlu_data": {
    "entity_synonyms": [
      {
        "value": "New York City",
        "synonyms": ["NYC", "nyc", "the big apple"]
      }
    ]
  }
}

and here is the coresponding link to the documentation.

The deprecated nlu-trainer isn’t / wasn’t capable of doing this but I think it should be fairly easy to add a custom logic that adds this functionality even to the UI.

Regards Julian

Topic		Replies	Views
Tool for training data in Markdown format? Rasa Open Source	1	773	December 21, 2018
Train a Rasa Model with JSON training data format Rasa Open Source	5	6611	March 31, 2020
NLU training data - json vs markdown Rasa Open Source	2	1066	September 7, 2018
Training NLU server (/train) with json data Rasa Open Source	4	1201	July 19, 2019
Using JSON Instead of Markdown with HTTP Server Rasa Open Source	3	676	August 2, 2019

Rasa NLU training data - JSON or markdown?

Related topics