Rasa 2.0 training files

Hi all,

I am facing some problems in porting my training data from 1.x to 2.x.

I have created 4 files in the data folder, nlu.yml, responses.yml, rules.yml, stories.yml. While rasa reads the first two, it cannot detect the rest two:

> 2020-10-03 14:26:24 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/nlu.yml' is 'rasa_yml'.
> 2020-10-03 14:26:24 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/responses.yml' is 'rasa_yml'.
> 2020-10-03 14:26:24 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/rules.yml' is 'unk'.
> 2020-10-03 14:26:24 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/stories.yml' is 'unk'.

I have tried many formats (indentation, remove blank lines, etc.) but without success. I have checked all files with ‘yamllint’, all are valid. What is puzzling is that rasa seems to ike files with more errors:

yamllint nlu.yml 
nlu.yml
  1:1       warning  missing document start "---"  (document-start)
  7:1       error    wrong indentation: expected 2 but found 0  (indentation)
  115:81    error    line too long (89 > 80 characters)  (line-length)

The file starts:

    version: "2.0"
    nlu:

    ##
    ## Intents
    ##
    - intent: flight_departure_info
      examples: |
        - Yes

The above gets recognised. "rules.yml’ is:

    version: "2.0"
    rules:
      - rule: Map "get_started" to "utter_get_started_el"
        steps:
          - intent: get_started
          - action: utter_get_started

Yamllint shows: yamllint rules.yml

    rules.yml
      1:1       warning  missing document start "---"  (document-start)
      31:81     error    line too long (87 > 80 characters)  (line-length)

How can I fix this?

Misleading documentation:

High-Level Structure#

Each file can contain one or more keys with corresponding training data. One file can contain multiple keys, as long as there is not more than one of a certain key in a single file. The available keys are:

  • version
  • nlu
  • stories
  • rules
  • e2e_tests

And then there is the code:

def is_yaml_nlu_file(filename: Text) -> bool:
        """Checks if the specified file possibly contains NLU training data in YAML.

        Args:
            filename: name of the file to check.

        Returns:
            `True` if the `filename` is possibly a valid YAML NLU file,
            `False` otherwise.
        """
        if not rasa.shared.data.is_likely_yaml_file(filename):
            return False

        try:
            content = rasa.shared.utils.io.read_yaml_file(filename)

            return any(key in content for key in {KEY_NLU, KEY_RESPONSES})
        except (YAMLError, Warning) as e:
            logger.error(
                f"Tried to check if '{filename}' is an NLU file, but failed to "
                f"read it. If this file contains NLU data, you should "
                f"investigate this error, otherwise it is probably best to "
                f"move the file to a different location. "
                f"Error: {e}"
            )
            return False

Only files having “nlu” or “responses” are selected.

Are you trying to train an NLU only model? The function you refer to is for identifying NLU data specifically. I can mix stories, rules and NLU and do rasa train and it recognizes both. Using 2.0.0rc4, the following file is read in and trained on correctly:

version: "2.0"

rules:
- rule: greet user
  steps:
  - intent: greet
  - action: utter_greet

nlu:
- intent: bot_challenge
  examples: |
    - are you a bot?
    - are you a human?
    - am I talking to a bot?
    - am I talking to a human?

stories:

- story: happy path
  steps:
  - intent: greet
  - action: utter_greet
  - intent: mood_great
  - action: utter_happy

And training an NLU only model with NLU split between nlu.yml and the mixed file above works fine too.

Could you post the full files you’re dealing with? I’m guessing there’s some other formatting issue going on

@mloubser

No, I have all the training info split i 4 files: nlu.yml (has only nlu: tag), rules.yml (I had to add an “nlu:” on top else it does not get recognised as a training file, despite the fact it contains “rules:”), stories.yml (again with “nlu:” on top), and responses.yml (it gets recognised).

Ok, I understand what you mean now.

Correction: It isn’t recognized for training TEDPolicy, but it is recognized for training rulePolicy.

@mloubser My experience with 2.0.0rc3, is that it says the data format is unknown. So, I am unsure what is loaded (and this is why I added “nlu:” on top of all files).

@petasis can you train using this file?

version: "2.0"

rules:

- rule: Say goodbye anytime the user says goodbye
  steps:
  - intent: goodbye
  - action: utter_goodbye

- rule: Say 'I am a bot' anytime the user challenges
  steps:
  - intent: bot_challenge
  - action: utter_iamabot

It’s possible there was a difference in the previous rc, since the rc’s are pre-release you should use the latest one

I reformatted your q since it was unclear what was code and what wasn’t - using triple backticks before and after code blocks helps makes this clear.

It trains successfully for me using the block above.

@mloubser If I remove “nlu:” from the top of the rules.yml file: version: “2.0”

rules:
- rule: Map "get_started" to "utter_get_started_el"
  steps:
  - intent: get_started
  - action: utter_get_started_el

During training with 2.0.0.rc4:

2020-10-06 14:03:46 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/nlu.yml' is 'rasa_yml'.
2020-10-06 14:03:49 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/responses.yml' is 'rasa_yml'.
2020-10-06 14:03:49 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/rules.yml' is 'unk'.
2020-10-06 14:03:49 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/stories.yml' is 'unk'.

rules.yml & stories.yml not recognised.

@mloubser If I add “nlu:” at the top, it works:

version: "2.0"
nlu: # Do not remove, else rasa will ignore the file...

rules:
- rule: Map "get_started" to "utter_get_started_el"
  steps:
  - intent: get_started
  - action: utter_get_started_el

version: "2.0"
nlu: # Do not remove, else rasa will ignore the file...

stories:
2020-10-06 14:07:59 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/nlu.yml' is 'rasa_yml'.
2020-10-06 14:08:02 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/responses.yml' is 'rasa_yml'.
2020-10-06 14:08:02 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/rules.yml' is 'rasa_yml'.
2020-10-06 14:08:02 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/stories.yml' is 'rasa_yml'.

Also on rc4?

@mloubser Yes.

@mloubser

rasa --version
Rasa Version     : 2.0.0rc4
Rasa SDK Version : 2.0.0rc1
Rasa X Version   : None
Python Version   : 3.8.5 (default, Aug 12 2020, 00:00:00) 
Operating System : Linux-5.8.12-200.fc32.x86_64-x86_64-with-glibc2.2.5
Python Path      : /usr/bin/python3

The bot you get from rasa init includes a rules-only file - does that bot work for you? I can’t reproduce this on python 3.8 and rc4 (or rc3).

@mloubser Yes, the bug exists even in rasa 2.0. Do a rasa init, and then rasa train --debug. You will see that stories.yml is listed as “unk”. I assume it is used for training, but at this point gives a wrong message to the user.

Ok! Yes the logging output is confusing - what I want to check though, is does that bot run for you without putting nlu at the top? The logs show the way they do because all parsers (nlu, stories, domain) try to parse all files (since keys can be mixed). So the NLU parser is saying stories.yml is unknown for its purposes. I’m opening an issue for making the logging output clearer, lmk if it still works in practice

Here’s the issue if you want to follow it/contribute: Improve logging messages for file format recognition during training · Issue #7000 · RasaHQ/rasa · GitHub

I cannot tell right now if they work or not. The stories/rules involve forms, and I am waiting for a fix to get them work in rasa 2.0.

You mean the rasa init bot? that one should work right away

No they do not work if I remove the nlu: at the top:

2020-10-13 12:31:19 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/nlu.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/responses.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/rules.yml' is 'unk'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/stories.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/airline.el.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/airline.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/city.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/flight_statuses.json' is 'unk'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/flying.yml' is 'rasa_yml'.
2020-10-13 12:31:23 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/time.yml' is 'rasa_yml'.
2020-10-13 12:31:30 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/airline.el.yml' is 'rasa_yml'.
2020-10-13 12:31:30 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/airline.yml' is 'rasa_yml'.
2020-10-13 12:31:30 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/city.yml' is 'rasa_yml'.
2020-10-13 12:31:30 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/flying.yml' is 'rasa_yml'.
2020-10-13 12:31:30 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/lookup/time.yml' is 'rasa_yml'.
2020-10-13 12:31:30 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/nlu.yml' is 'rasa_yml'.
2020-10-13 12:31:43 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/responses.yml' is 'rasa_yml'.
2020-10-13 12:32:06 DEBUG    rasa.shared.nlu.training_data.loading  - Training data format of 'data/stories.yml' is 'unk'.
Traceback (most recent call last):
  File "/home/pepper/.local/bin/rasa", line 8, in <module>
    sys.exit(main())
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/__main__.py", line 116, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/cli/train.py", line 81, in train
    return rasa.train(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 43, in train
    return rasa.utils.common.run_in_loop(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/utils/common.py", line 308, in run_in_loop
    result = loop.run_until_complete(f)
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/train.py", line 95, in train_async
    domain = await file_importer.get_domain()
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/utils/common.py", line 119, in decorated
    return await cache.cached_result()
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/importers/importer.py", line 440, in get_domain
    original, e2e_domain = await asyncio.gather(
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/utils/common.py", line 119, in decorated
    return await cache.cached_result()
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/importers/importer.py", line 315, in get_domain
    existing_nlu_data = await self._importer.get_nlu_data()
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/utils/common.py", line 119, in decorated
    return await cache.cached_result()
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/importers/importer.py", line 289, in get_nlu_data
    nlu_data = await asyncio.gather(*nlu_data)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/importers/rasa.py", line 58, in get_nlu_data
    return utils.training_data_from_paths(self._nlu_files, language)
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/importers/utils.py", line 11, in training_data_from_paths
    training_data_sets = [loading.load_data(nlu_file, language) for nlu_file in paths]
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/importers/utils.py", line 11, in <listcomp>
    training_data_sets = [loading.load_data(nlu_file, language) for nlu_file in paths]
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/nlu/training_data/loading.py", line 60, in load_data
    data_sets = [_load(f, language) for f in files]
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/nlu/training_data/loading.py", line 60, in <listcomp>
    data_sets = [_load(f, language) for f in files]
  File "/home/pepper/.local/lib/python3.8/site-packages/rasa/shared/nlu/training_data/loading.py", line 107, in _load
    raise ValueError(f"Unknown data format for file '{filename}'.")
ValueError: Unknown data format for file 'data/stories.yml'.