“Rasa NLU v0.13 is coming very soon and will include … other highly requested features like multiple training processes”
Does this mean I will be able to train the NLU in small units?
My current problem is that I have 80 intents, so online_training is a real pain when trying to find the number of the correct intent, and it would be much easier if I could train by feature rather than the whole lot all at once.
The only question I have, is why this is just NLU, as training involves the core as well, right?
You don’t have to use online training to train NLU, it’s definitely easier to train this separately.
I’m afraid you will still have to train the model as a whole though. This just means you can have multiple training processes simultaneously
What is the benefit of multiple training processes just for NLU?
I understand that online-training is for the core, when you say “train the model as a whole” which model are you talking about. When I train the nlu it goes into models, when I train the core it goes into models… which model will I still need to train as a whole.
@akelad when training on a large dataset, how exactly is this going to work?
Is it one dataset - being trained on multiple threads?
or Multiple datasets - being trained in parallel?
My issue is mostly linked to the first one - One approach we thought of is to persist pre-computed vectors of previously seen utterances. As training data usually evolve, utterances that it has trained upon before, the vectors computed can be persisted as it is an average of vectors of each tokens in the sentence. This could significantly improving the fitting of the model when using sklearn.
Correct me if I am wrong or are there other ways to improve performance of training process
Yes, so the multiple training processes refers to the NLU HTTP server. If you start NLU as a server, it will happily run multiple different trainings in parallel (e.g. for the same project or different projects).
I see the point you are making about the online learning and many intents, that is not really useable. We’ve been working on a restructuring of rasa core, that will also allow to do the online learning over http - opening up the api for better interfaces than the command line. we haven’t build an interface yet, but the change should make that a lot easier.
Thanks Tom, that makes it a bit clearer. I thought it was referring to me training the NLU, but actually it is referring the running multiple nlu models, after they’ve been trained, in the server at the same time. I think someone gave this the wrong name.
Now you can probably see why I was thinking about core training as well, but actually this is a different point altogether, albeit still a relevant one.
No it is referring to training multiple NLU models at the same time, not running them, I think the description “multiple training processes” is accurate. Though I agree it can be interpreted in different ways