Rasa NLU training data - JSON or markdown?

rasa-nlu

(Zoltan Fedor) #1

Hey all, I am new to Rasa, just finished reading through all the documentation and starting to build my first chatbot and making my architectural decisions now.

One thing I am wondering about is whether shall I go with JSON or markdown when creating the Rasa NLU training files. I know, that both of these support the option of having training data in multiple files (a must for large projects) and I know that markdown is more human-readable and simple, while JSON has a trainer (https://github.com/RasaHQ/rasa-nlu-trainer) which can help you to create the JSON files.

The question is what other pros and cons each format has? Is one more strategical for future enhancements than the other?

Thanks


(Souvik Ghosh) #2

It really depends on how your team is going to manage training data. We are a large organization with cross-functional team and hence we want to empower our business users to enrich training data so we prefer using an UI to do so. Hence our training data format is JSON. However some power users love Markdown as it is really easy to understand, so some points to consider

  • Size of your team
  • Who is going to enrich the data
  • Maintainability of the stack

(Zoltan Fedor) #3

Thanks, yes, that makes sense. We will start out small, but later we will likely end up like you - with a larger cross-functional team enriching training data. As such it would be probably easier to start out with the Markdown, but later the JSON with its UI would be more beneficial.

I think we will go with the JSON now in anticipation of the future growth of the group who will maintain the training examples.

Thanks