Hi All,
I have divided my training data into multiple yml files based on intents and kept all of them inside data/nlu directory. Training data size seems to be around 1.5 GB.
When I trigger rasa train using nohup as given below, it doesn’t write anything into file (rasa_train.txt), even after 2 to 3 hours of continuously run.
nohup rasa train > rasa_train.txt &
Can anyone please help me how to handle this scenario.
Hi! Before we diagnose what might be taking so long, can you please describe what use case you are building your bot for? 1.5 Gb of text data is a lot for chatbot systems and may not be actually needed. Is this some kind of scripted / paraphrased data by any chance?
Hi @dakshvar22,
We have an application related to telecom domain and using it’s functionality we generated set of questions for chatbot.
This training data have 15 entities with 10-12 values each entity and 3 entities which have 2500 values each entity, so using all of these entity combination we are able to generate 1.5GB of data using python script.
Thanks!
Have you tried handcrafting a training dataset before taking the route of generating data using a python script?
Your training data doesn’t need to consist of all combinations of entity values, that’s where machine learning systems help you out
For entity types which have a lot of possible unique values, have you tried taking the lookup tables approach? If that didn’t work, could you please mention why?
For entities which do not have a lot of unique values (like the 15 entities you mentioned) its not absolutely necessary to have a training data point which demonstrates every possible way in which that entity could be expressed. You can start with adding a few patterns by hand and then iterate over by looking at data from incoming conversations. These pages from the docs will help motivate the approach I am suggesting above even more:
1 Like
Thanks
@dakshvar22
I will check and try it out.
Hi @dakshvar22 I tried lookup table approached and it will help us to reduce training data size and also maintain all the combination of questions for the intent.
Actually I am unable to extract entities using lookup table here I explained the issue.
@dakshvar22 If I am doing anything wrong please specify.