Rasa NLU training data - JSON or markdown?

Hey all, I am new to Rasa, just finished reading through all the documentation and starting to build my first chatbot and making my architectural decisions now.

One thing I am wondering about is whether shall I go with JSON or markdown when creating the Rasa NLU training files. I know, that both of these support the option of having training data in multiple files (a must for large projects) and I know that markdown is more human-readable and simple, while JSON has a trainer (https://github.com/RasaHQ/rasa-nlu-trainer) which can help you to create the JSON files.

The question is what other pros and cons each format has? Is one more strategical for future enhancements than the other?


1 Like

It really depends on how your team is going to manage training data. We are a large organization with cross-functional team and hence we want to empower our business users to enrich training data so we prefer using an UI to do so. Hence our training data format is JSON. However some power users love Markdown as it is really easy to understand, so some points to consider

  • Size of your team
  • Who is going to enrich the data
  • Maintainability of the stack

Thanks, yes, that makes sense. We will start out small, but later we will likely end up like you - with a larger cross-functional team enriching training data. As such it would be probably easier to start out with the Markdown, but later the JSON with its UI would be more beneficial.

I think we will go with the JSON now in anticipation of the future growth of the group who will maintain the training examples.


1 Like

Apart from the formatting is there any difference in functionalities between them? one more question, How can we create synonyms in JSON using UI? @souvikg10

Hi @capgos17

according to the docs, synonyms using JSON can be added by using the following syntax:

  "rasa_nlu_data": {
    "entity_synonyms": [
        "value": "New York City",
        "synonyms": ["NYC", "nyc", "the big apple"]

and here is the coresponding link to the documentation.

The deprecated nlu-trainer isn’t / wasn’t capable of doing this but I think it should be fairly easy to add a custom logic that adds this functionality even to the UI.

Regards Julian