Devide stories on multiple files

Hi everyone, I’m using Rasa to create a chatbot and I added multiple stories in stories.md file. I need to know how can I read stories from other .md files in order to organize my work and make files more understandable.

Thank you,

1 Like

Great question, this is really easy to do.

Create a new directory under data called core - data/core and put your story files here. The filenames are arbitrary - name them whatever you want just make sure you keep the .md extension.

You can also create a data/nlu directory do the same thing with your nlu data.

2 Likes

Thank you @jonathanpwheat :smile:

Is there someway i can write actions in multiple files ??

I haven’t tried that yet, although it is just a python file, so maybe you can split the classes out into separate files and import them at the top? (just guessing, haven’t tried it) I f that’s the case, then the filenames shouldn’t matter there either.

Let me know if that works for you (I’m curious as well, but don’t have time to experiment this week)

1 Like

Yep, pretty much the same way. Make a folder actions and put your action-files in there. Ofc you need to import stuff from other files if you want to use it in multiple files.

1 Like

Thank you so much @jonathanpwheat and @IgNoRaNt23 :smile:

Is there an offical way to achieve this with templates/responses?

Unfortunately, not that I’ve seen, no, that’s why I had to build a script to it for me. It works really well, fwiw :slight_smile:

Sorry to post on a (slightly) older thread. Just wondering if you @jonathanpwheat or anyone else have noticed any problems with splitting the md files up. I’m wondering if there is a certain number of files that might start to cause problems or something to that effect. Also have you come up with your own conventions for when and how you break apart your stories? Just a curiosity :slight_smile:

Hey Andrew, I haven’t had any problems with splitting them up. I currently have 17 (and counting) each nlu and story files. I’ve also split up my domain.yml file (not supported by Rasa) and reassemble it before I train using a script I built here - Rasa domain Assembler · GitHub

I break them up by scenario, which allows me to organize them, and makes it easier to have other people work on certain dialog without stepping on each other’s toes.

I’m happy to explain further if you want, or if you have questions how I split up the domain.yml file.

1 Like

Great idea, I will definitely follow up on this.

Thats to good hear. My other concern was support with Rasa X. I did some tests with a dummy project as I didn’t see anything in the docs about it (unless it was added recently or I missed it). I was curious about how it added new stories like how did it decide which file to put them in or issues with saving. I think it just put it in the file with the latest creation (or updated) date?

It would be nice if there was a way to specify which file it gets saved to. I write all my rasa code in VSCode but we have a non-technical client who would occasionally add some stories from X which of course would mean that it would just add it to most likely the wrong file.

Yeah the domain split does sound interesting. how do you split up your domain, and how does that flow work?

I haven’t used Rasa X, so I’m not sure where it places it’s data. I think I read you can hook it to github to pull in your files for CI / CD, but not sure how it saves the data going into the web UI.

I’ve been using VSCode as well, and creating everything with .yml and .md files too. I could see a nice web interface that allows a non-techie to help build this out and have it save to files. I had another developer help with the files for a bit so keeping them organized really helps.

Our bot is a technical support bot and we have a big set of scenarios / problems users face and the files are organized and broken down by those. I guess if you want to get technical, they’re broken down by intent.

For example change password. I’ll have this structure and increment the index number (000) part of the filename for each problem. Those index numbers cross reference back to internal design documentation we have about the issue / solution,etc.

/data/core/s000_change_password   
/data/nlu/n000_change_password
/data/domain/d000_change_password

The core and nlu files get ingested by Rasa when you train automatically. The domain files get merged into a single domain.yml file with my script.

I created a shell alias called rtrain that runs my merge script then rasa train because I kept forgetting to merge my domain files before I trained

1 Like

Hey @andrew.tangowork, have you tried Rasa X with divided Stories and NLU files? Did you have some issues with it? I followed @jonathanpwheat advise and have stored nlu and strories files in data/nlu and data/core directories. But Rasa X seems not to see nlu and stroies at all. i wonder if somebody had this problem too.

Hi @jonathanpwheat, I am curious about,

- generally, how many stories are there in your story files and

- how many intents are there in your nlu files,

- what is the average number of samples in your intents.

I know that number varies but I wonder your rough averages as you have hundreds of stories and intents. You must have a great experience on numbers of examples, intents, and stories in order to make robust your “intent extractions” and “next action predictions”.

Thank you in advance,

Hi @huseyinyilmaz01,

Our implementation is a support agent, and most of our data / training data / story and form “ideas” come from actual email conversations between an agent and a customer. We can mine this data to provide a very rough block of content to use with Rasa. As you can imagine, there are many support issues that arise in an business and if our bot can’t resolve it, it will collect as much information as necessary to build a profile to send to a human agent.

That said - We typically have 5-10 stories per intent depending on the complexity, so breaking each intent into it’s own nlu file is necessary simply for finding things.

We probably have 130-150 intents - and we treat each “issue” as an intent. So for example “I can’t print to x” so printing_problem becomes the intent and we’ll build out that scenario based on the data that gets mined. We’ll sometimes built out partial scenarios and trigger them with an intent as well,so they add up quick.

Our training data varies, we’ll come up with various ways to say something - maybe 10-20 different ways, with different entities inserted if there is different predictable instructions. So for the printing problem, we may have - I can't print to the [HP Laserjet](printer_name) on the [3rd floor](printer_location) those values get slotted in the event a support ticket has to get created, then we can pump all that data in to help the agent NOT have to re-ask those questions.

Our data mining algorithm also supplies training data, although it’s not as nice looking. We’ve also played with Speech To Text (STT) and had to add a bunch of oddities in the training data, because if you don’t enunciate properly, it’ll get picked up as odd spellings and phrasings. Here’s a great example. There’s a system called ERMS and when the STT translates what you say, it’ll almost reliably “hear” it as yara mess so we added that spelling in as training data and now if you say ERMS it’ll trigger that intent :slight_smile:

The system just grows and splitting this data out really helps development.

I realize this doesn’t fit everyone, and we’re thinking about RasaX, but I’m afraid I’ll loose my separated domain files and that’ll be nightmareish I think.

2 Likes

Hi @jonathanpwheat, Thank you so much for this precious information.

If you don’t mind I have some additional questions since we are about to build a chatbot.

  • Is there any rough average for number of examples in a single intent, for your bot?

  • Do you collect the required sequential information via forms, or do you use another way, or let me say "do you consider FORMs useful?

the $64 question :slightly_smiling_face:

  • Do you think that RASA is doing well as a chatbot framework? In other words, there are other competitors some of which belong to tech giants, have you ever thought that it would be better to use another framework instead of RASA?

Hi @jonathanpwheat,

Did I ask something wrong or wrongly?

Sorry for the delay.

Great questions, When I’m developing something new, I’ll rasa init a new project and I’ll use 2 training examples (words) to trigger the intent, mostly because the trainer likes to see 2, but also because if I want to build and test a new idea, I’m literally typing one word at the prompt to trigger my intent.

For our production bot, I average 15 examples for an intent I guess.

I do use forms, I love forms to collect required information for something if there are multiple slots required. I have some complex nested if statements to require different slots depending on answers to other slots, it makes the bot very flexible that way. You can do some pretty neat things in the validation and submit methods as well to add more robustness / personality or triggers to extend the overall functionality of your bot.

If you don’t use forms to collect a set of data, you can use the /inform intent to set slots and entities, but you have to do a lot of testing to make sure your story won’t get sidetracked or the answer to one question triggers a different intent. I prefer forms because it adds a little rigidity once you get into a specific scenario.

I do think Rasa is doing well, and I prefer it over the others. One HUGE selling point for us is that it is completely self contained and can be hosted at our client’s facilities, fire-walled in if necessary and doesn’t rely on external services like Lex, DialogFlow or other 3rd party services for AI. The only API dependencies we have is when we setup external APIs into our client’s products, like a support ticketing system or hit some of our internal predictive APIs.

I’m really looking forward to version 2, it has some pretty amazing features I can’t wait to get deep into.

I hope that helps.

1 Like

Hi @jonathanpwheat,
It is really helpful. It will repetition, but your comments are precious. Thank you

1 Like