How to load training data from database

We have our training data stored in database and looking for a way, where RASA can load that training data directly from database and train the chatbot.

I have read so many blogs and came to know that there is no built-in feature in Rasa to import training data from a database and found two solution which are given below :-

  1. We can write a python script to generate nul.yml, domain.yml etc… files to train a chatbot.
  2. Write our own RASA Custom Importer to import data.

Actually we have huge amount of training data somewhere around 15GB to 20GB, if we go with solution #1 due to the huge size of .yml file it might crash.

Could anyone please help us with the appropriate way to handle this scenario

1 Like

Welcome to the forum :slight_smile:

The two ways are indeed the ones you mentioned.

Solution #1 is easier to implement, but, as you said, will take a lot more storage because data will basically be duplicated.

Solution #2 is harder but cleaner and more stable, and will not take more storage.

I suggest going with Solution #2.

Thank you @ChrisRahme for your valuable feedback.

Actually, our training data have only questions with entity and it’s value and stored in oracle database. If I go with Solution #2, how I can use RASA Custom Importer to fetch those questions directly from database and train the chatbot.

Can anyone please help me with some link or python code to achieve this.

Referencing the docs here : Training Data Importer

Could someone please explain what exactly this does, for me it seems like loading the data from file itself.

1 Like

You only have to look at Writing a Custom Importer.

You basically have to just write Python code to transform your data into regular training data format and domain.

Thank you so much @ChrisRahme.

Actually I thought of going with different approach over here, creating multiple nlu.yml files and restrict each file with some size threshold value, once that threshold value reached create new nlu_xyz.yml file keep doing it for entire training data.

I am thinking that, if I go with storing training data into database due to huge size of data (15GB - 20GB) database query might crash, so I think it’s better to create multiple nul.yml files with limited size.

@ChrisRahme what do you think about this approach ?

1 Like

Maybe I’m missing something, but isn’t creating multiple small files the same as one big one?

@ChrisRahme having one big file will always open in view mode and won’t be able to manipulate it. So will have added advantage with small files.

1 Like

I see. You can do that, no problem. Rasa understands when there are multiple NLU/stories/rules files, but use only one domain.

1 Like

@ChrisRahme thank you so much, I will try the same and let’s see how it goes.

1 Like

Did you load the data from custom importer and if yes then can you help me with the code as i have a db with intents and responses but i am finding it hard to write the importer. Can you help with it ?

Hi @Amandeep,

Actually we thought of saving data in DB, later we decided to go with we used .yml file for the same. I think this might help you.

My client has a database in which there are questions and answers so what should i do to automate the process of copy and pasting the intents and responses from the database to the nlu, domain, rules files ? Can i write a custom importer which will fetch the data from the database and write the domain , nlu, rules files ?

@Amandeep,

Recommended way is to write custom importer to import the training data from DB, if you are finding it difficult then, write simple python script to read the data from DB and generate .yml files.

To load training data from a database in Rasa, you can follow these general steps: Begin by establishing a connection to your database. The specific method will depend on the type of database you are using. Rasa supports various databases such as SQLite, MySQL, PostgreSQL, etc. You can use appropriate libraries or drivers to establish the connection. Once connected, execute a query to retrieve the training data from the database. This query can be customized based on your database schema and the structure of your training data. The retrieved data needs to be formatted into a suitable format for Rasa. Rasa expects training data in the form of NLU (Natural Language Understanding) and Core data. You can transform the data into Rasa-compatible formats such as Markdown, YAML, or JSON. Use Components for Delphi to load the formatted data. Rasa provides functions or methods to load NLU and Core data separately. For example, you can use the load_data() function from the rasa.shared.nlu.training_data.loading module to load NLU data.

i think so as well