Story Loading on multiple cores

Hi guys!

We have a project with 280k stories and we want to train a basic Keras policy in Rasa Core with it.

It takes two hours to load the stories and around 150 GB of RAM. We noticed that the loading is performed on a single core.

Is there any plan to move to multiple cores to accelerate this process? Is it a python limitation with the use of await async?

Thanks, L-P and the Dialogue Team

9 Likes

Sorry to answer with an unrelated question, but how did you get 280,000 annotated stories?

1 Like

Hi Leo!

We have these stories because we are trying to move from our actual rule-based system to an LSTM policy. It is thus easy for us to create “stories” from our rule-based system and then convert them to Rasa format so we can use them as training data.

1 Like

I will also add to @Lp-dialogue that we are generating a huge amount of stories to be able to get a performance that is equivalent of better than our actual rule based model. I was really surprised that the process of loading stories was not using multiple cores.

I saw that you are using await async in that process. I think by design it only uses 1 core right?

hey That is interesting. How did you do that.

@amn41 @Ghostvv his is a use-case of training LSTM based on a rule-based system. We discussed it earlier, we were wondering if this is something you’ve seen other people face and if you have any suggestions of how to train the model with a large training set.

The reason why this training is so large is that we observe that the model keeps learning with more data. We do have early-stopping in place, but it still quite slow at this scale.

Any help on what to investigate would be appreciated.

Thanks, Alexis

2 Likes

I’m guessing Rasa was never designed to work with this many stories. I think the Embedding policy claims to work well in adapting to deviations from the happy path with 30 stories or so and the Keras Policy with about a 100. Isn’t the whole point to build a model that can generalize well from a few hundred cases instead of feeding it every possible permutation? I’m not very familiar with the ML aspect of this, so if someone could explain how you avoid overfitting and losing the benefit of the models “artificial intelligence”, I would really appreciate it.

Hi L-P,

Thanks for the post. This is super interesting, we’ve never trained on that many stories before so I’m not surprised it’s taking a while. I suspect that a great deal of your stories are implementing some business logic. Did you consider wrapping some of that logic up into a form? I suspect you could dramatically reduce the amount of stories you generate that way. The ideal case (as I see it) would be that you implement the business logic inside of forms, and use ML to learn the deviations from the happy path. I’m not sure how those are handled in your rule-based system (if at all) but maybe you could post a couple of example stories and we could have a look.

1 Like

Hi Alan,

Thanks for your answer. Yes, we are trying to mimic a business logic, but it’s a pretty complicated one. We have around 800 possible questions and 1200 possible outcomes. We are trying to learn the best questions to ask based on the results of the previous ones to maximize the confidence in the outcome and minimize the number of questions asked.

We do not want to implement a form, since our whole point is to get away from an “hard-coded” system.

Also, with our entire dataset (325k stories), we are not able to train as we get a sig kill right after Rasa is done loading the data…

Hi, thanks for your post, that’s very interesting idea to train ML policy to learn overly comlicated rule-based system.

How do you load and featurize the stories? I’m afraid with this amount of stories you’d be better feeding them directly from the file using tf.keras during training.

I’m not sure about tf.keras, but TensorFlow has special tf.data.Dataset object to handle data: Importing Data  |  TensorFlow Core  |  TensorFlow

I think they somehow integrated it into tf.keras

1 Like

@Ghostvv

That would mean by-passing the normal Rasa Core training? As of now, we use the provided agent.load_data( ) method.

for this amount of stories, I think it’d make sense to bypass it or customize it, because it loads all of them then creates one-hot featurization and stores it into dense numpy arrays that occupy unnecessarily lots of memory. Using sparse representation should significantly reduce memory load.

@Ghostvv Let’s say we would like to train the model in increments. For example, train it with 50k stories, save the obtained model. Then, load that model and re-train (fine-tuning style) on 50k other stories.

Is this functionality available in Rasa Core for Keras Policy?

you could do it if you modify agent.train method to load previously trained model, but I wouldn’t recommend it as the model would likely forgot previous data and overfit to the new data

Well if we sample the 50k stories according to the overall distribution and our batch size is constant, this would theoretically make no difference at all.

I’m not sure, because you don’t give old stories to the algorithm, so algorithm will slowly start to override old behavior with the new one. Exact details depend on how varied your stories are