Rasa cannot find story path/skipped training core

After preparing all needed data as required, run rasa train only trained nlu model and prompt throws out a mesage

“No stories present. Just a Rasa NLU model will be trained”

This is weird, because there is a story file under

./data/stories/story_0.md

my working folder looks like this:

│   actions.py
│   config.yml
│   credentials.yml
│   domain.yml
│   endpoints.yml
│   __init__.py
│
├───.idea
│       Demo_NewRasa.iml
│       misc.xml
│       modules.xml
│       workspace.xml
│
├───data
│   │   nlu.md
│   │
│   └───stories
│           story_0.md
│
├───extra_data
│   │   total_word_feature_extractor.dat
│   │
│   └───custom_dict
│           dict1.txt
│
├───models
└───__pycache__
        actions.cpython-35.pyc
        __init__.cpython-35.pyc

and my config file:

language: zh pipeline:

  • name: MitieNLP model: extra_data/total_word_feature_extractor.dat
  • name: JiebaTokenizer dictionary_path: extra_data/custom_dict
  • name: MitieEntityExtractor
  • name: EntitySynonymMapper
  • name: MitieFeaturizer
  • name: SklearnIntentClassifier

policies:

  • name: KerasPolicy epochs: 500 max_history: 5 learning_rate: 0.001
  • name: MemoizationPolicy max_history: 5
  • name: MappingPolicy
  • name: FallbackPolicy nlu_threshold: 0.2 core_threshold: 0.2

I tried rasa init and then rasa train under the initialized folder, and it worked. But when I use my own data, rasa seems cannot find story file.

Any advice would be helpful, thanks!

what happens if you try rasa train --data ./data/stories ./data?

I suddenly realized, would it be a environment variable problem? My environment: Windows 10 + Anaconda virtual env Python 3.5. But all other functions of anaconda and python work just fine…

Just tried, didn’t work, same result … Also, if I train only rasa core rasa train core

rasa.core.training.generator - There is no starting story block in the training data. All your story blocks start with some checkpoint. There should be at least one story block that starts without any checkpoint.

That’s strange, this is the first training of core and certainly it has no checkpoint been saved, but what’s the right way to train core, or train both nlu and core after rasa init?

checkpoint in this context is not the saved weights of a model. Rather, it is this.

Can you share the content of your stories file?

I still wrote story by following rules of old version of rasa, without checkpoint you mentioned, this is part of story file:

## Generated Story ask_reimburse_flow_0
* greet
  - utter_greet
* ask_reimburse_flow{"business_type": "\u62a5\u9500","business_demand": "\u6d41\u7a0b"}
  - slot{"business_type": "\u62a5\u9500"}
  - slot{"business_demand": "\u6d41\u7a0b"}
  - action_ask_reimburse_flow
* thanks
  - utter_thanks
  - export

## Generated Story ask_reimburse_flow_1
* greet
  - utter_greet
* ask_reimburse_flow{"business_type": "\u62a5\u9500"}
  - slot{"business_type": "\u62a5\u9500"}
  - action_ask_reimburse_flow
* ask_reimburse_flow{"business_demand": "\u6d41\u7a0b"}
  - slot{"business_demand": "\u6d41\u7a0b"}
  - action_ask_reimburse_flow
  - utter_ask_morehelp
* thanks
  - utter_thanks
  - export

I’m not able to reproduce the errors you got.

I’m assuming you are using the latest version of Rasa?

Yes, but the project works fine with rasa==0.15.0a1, I was planning to migrate this to the latest version of Rasa.

I just noticed something peculiar about your problem.

Are you saying that if you do rasa train then rasa says it can’t find your stories but if you do rasa train core you get a different message?

Anyway, try this command:

rasa train core -s ./data/stories/story_0.md

Yes, that’s what I mean. When I run rasa train core -s ./data/stories/story_0.md, got:

Training Core model...
Processed Story Blocks: 0it [00:00, ?it/s]
2019-07-31 00:25:29 WARNING  rasa.core.training.generator  - There is no starting story block in the training data. All your story blocks start with some checkpoint. There should be at least one story block that starts without any checkpoint.
2019-07-31 00:25:29 INFO     rasa.core.policies.ensemble  - Skipped training, because there are no training samples.
d:\anaconda3\envs\rasa\lib\site-packages\rasa\core\policies\keras_policy.py:293: UserWarning: Method `persist(...)` was called without a trained model present. Nothing to persist then!
  "Method `persist(...)` was called "
2019-07-31 00:25:29 INFO     rasa.core.agent  - Persisted model to 'C:\Users\ADMINI~1\AppData\Local\Temp\2\tmpznu2eqzk\core'
Core model training completed.

which means rasa receives empty story. Actually, I modified rasa source code rasa/rasa/train.py line140 to see what exactly happend here, I print out the story that rasa received:

if stories.is_empty():
    print_warning("No stories present. Just a Rasa NLU model will be trained. \n%r\n" % stories.as_story_string())
    exit()

and it outputs:

No stories present. Just a Rasa NLU model will be trained.
''

the string with length of zero indicates that something is wrong. But strange thing is, if I run rasa init then rasa train, everything is just fine. I even checked encoding of story_0.md file, but nothing new.

I finally found the root cause! :grinning:

The reasons why rasa couldn’t find story_0.md were:

  1. We know that nlu training data and core training data both support markdown format, but the way rasa recognizing if it’s an nlu training data file or a core training data file, was heuristic: if a .md file contains any element from

    _markdown_section_markers: ['## intent:', '## synonym:', '## regex:', '## lookup:'], then it will be added to nlu training data, otherwise it belongs to stories. This seems reasonable, since these four elements are used in nlu markdown file.

  2. The tricky things here are, some of my story blocks begin with "## intent: ..." therefore the whole file was considered as nlu training data.

When I modified all "## intent: ..." in story to "## intent_...", rasa was able to find stories and to train core.

Maybe rasa can optimize the way recognizing nlu and story files. :smirk:

And when I read the source code, I felt like, it was a bit difficult to debug the source code when all command can only be executed in cmd/bash, I have to insert many print() funcs to see the values of suspicious variables…maybe it has a better way, hope someone can suggest a better way to do this :grimacing:.

2 Likes

I didn’t have ##intent: … in my stories file but I still get this error. The file is in the correct path but it doesn’t recognize it as a stories file. I’ve already spent hours on this. This is just crazy. I didn’t have this issue in the previous version.

You have to be very careful with the format, try to use a text editor that will highlight the space, tab and other things and go through stories line by line… there’s no better way… hope RASA will develope some kind of format-checking tools to indicate those things…

The warning here helped me. " / symbol is reserved as a delimiter to separate retrieval intents from response text identifiers." I used this symbol in my intent name. I changed that and the problem was gone.