Unknow data format error during nlu training and format conversion

I want to migrate an app to Rasa 1.1.4. I created a new environment and setup Rasa 1 with Rasa X and the default app works fine. However, I am facing some issues when I add existing NLU data(json format) and train the model. I have several files for training data, one json for every intent. Below is one sample data:

{
  "rasa_nlu_data": {
    "common_examples": [
      {
        "text": "Did I really say that",
        "intent": "assist.answer",
        "entities": []
      },
      {
        "text": "I didn't say that",
        "intent": "assist.answer",
        "entities": []
      },
      {
        "text": "Are you sure I said that",
        "intent": "assist.answer",
        "entities": []
      }
    ]
  }
}

In my previous environment I had rasa-nlu 0.14.4 . I had no issue with the json data or training on it. But with Rasa 1, when I try to train the model or convert NLU data to markdown, I get this below error

Unknown data format for file

I did validate the json using https://jsonlint.com/. Not sure what I’m missing here. Any help? Thanks!

2 Likes

@avinash1 I have some problem reproducing your error. I copy pasted your example json data into a file called nlu.json and executed rasa data convert nlu --data nlu.json --out nlu.md -f md. Everything works as expected. Also training works fine.

If you have multiple json files with NLU data, make sure to put them into a folder. If you convert the data using rasa data convert nlu your data folder should not contain any story data otherwise the error unkown data format for file can occur.

What commands are you executing?

Thanks for the response @Tanja. Here’s the command I’m using to convert data to md - rasa data convert nlu --data data/answer.json --out data/answer.md -f md. This is the only file I have in the data folder right now.

For NLU training, i used rasa train nlu

mmhh… looks good. What is the exact error message you are getting? There should be more than just unkown data format for file.

I just tried the other way. When I tried to convert nlu.md(the default intent data on project creation) into nlu.json. the convert command worked fine.

Here’s the error log:

Traceback (most recent call last):
  File "/home/theimgclist/miniconda3/envs/rasa2/bin/rasa", line 10, in <module>
    sys.exit(main())
  File "/home/theimgclist/miniconda3/envs/rasa2/lib/python3.7/site-packages/rasa/__main__.py", line 70, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/theimgclist/miniconda3/envs/rasa2/lib/python3.7/site-packages/rasa/nlu/convert.py", line 38, in main
    convert_training_data(args.data, args.out, args.format, args.language)
  File "/home/theimgclist/miniconda3/envs/rasa2/lib/python3.7/site-packages/rasa/nlu/convert.py", line 25, in convert_training_data
    td = training_data.load_data(data_file, language)
  File "/home/theimgclist/miniconda3/envs/rasa2/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 56, in load_data
    data_sets = [_load(f, language) for f in files]
  File "/home/theimgclist/miniconda3/envs/rasa2/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 56, in <listcomp>
    data_sets = [_load(f, language) for f in files]
  File "/home/theimgclist/miniconda3/envs/rasa2/lib/python3.7/site-packages/rasa/nlu/training_data/loading.py", line 115, in _load
    raise ValueError("Unknown data format for file {}".format(filename))
ValueError: Unknown data format for file data/answer.json

Not sure, why you are getting this error. Are you still facing it? Can you maybe upload your data file? Then I’ll try again.

even i am facing the same error. The json file looks fine but it is throwing unkown data format.

1 Like

I got the solution. So when i checked i found out that my data.json file was in utf-8 fromat which is causing problem so i changed the encoding to ANSI in notepad which worked out.

1 Like

Thanks man i got it. I think rasa should handle this things. @Tanja @Juste

Glad you find the solution! Thanks!

Can you please open a bug issue on GitHub? Thanks!

I just faced the same issue while using JSONs. Any reason why JSON formats aren’t supported anymore? It used to work well in the prior versions. I had to convert the JSON to ANSI before running the rasa data convert utility. A bit annoying but it worked. Thanks @indranil180!

1 Like