Where can I find nlu datasets related to mental health for bot training?

kpreetsid · August 16, 2022, 5:37am

Currently, I’m working on mental health assistance, where users will share their thought and I’ve to identify the type of condition according to the input. Where can I find such data sets for the training part

artemsnegirev · August 16, 2022, 11:57am

Hello!

Is that repo suitable for you project purpose?

kpreetsid · August 16, 2022, 12:19pm

Thanks, @artemsnegirev for your help. Let me check the repo

Jasmine69 · August 17, 2022, 7:47am

My idea is to build a chatbot which behaves like a therapist/psychiatrist and helps the person ease a little, in a simple conversational manner.

kpreetsid · August 18, 2022, 2:35am

Yes, I’m trying to achieve something similar to this. For open-ended conversations with people, the bot should classify intent correctly, only after that, we will be able to write interactive stories to guide users to different paths. I’m looking for interactions between therapists/psychiatrists/psychologists and users to train the bot to understand different intents (different conditions). @Jasmine69 do you have records of such conversations?

kpreetsid · August 18, 2022, 6:04am

Found a solution to gather data related to different mental health conditions by scrapping Reddit using python script, it’s a bit slow and involves some manual work but works for me

artemsnegirev · August 18, 2022, 6:29am

Sounds interesting, could you share tread here?

kpreetsid · August 19, 2022, 3:37am

The first step is to create an app to get API access to Reddit subreddits,

Once created make note of the personal use script and the secret token will use these while making API requests.

after that use the following script to get subreddits according to the name.

import requests
auth = requests.auth.HTTPBasicAuth('<your_personal_use_script>', '<your_secret_token>')
data = {'grant_type': 'password',
        'username': '<your_username>',
        'password': '<your_password>'}
headers = {'User-Agent': 'your_app_name/0.0.1'}
res = requests.post('https://www.reddit.com/api/v1/access_token', auth=auth, data=data, headers=headers)
TOKEN = res.json()['access_token']
headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}
res = requests.get("https://oauth.reddit.com/r/EatingDisorders/hot",
headers=headers, params={'limit':'100'})
for post in res.json()['data']['children']:
    print( post['data']['title'])

Currently, I’m printing only titles of the subreddits, more data can be found in the result returned from the API call. We can look for subreddits manually first, after that use that tag in the URL to get the top 100 threads. Sharing the original blog post from where I took references.

https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c

Hope It helps.

pkchoudhary1211 · January 28, 2024, 9:58am

For mental health datasets, check out Kaggle and UCI Machine Learning Repository.

Topic		Replies	Views
Request for conversational dataset Rasa Open Source	1	582	April 4, 2019
How to train a rasa AI using reddit subreddits and 4chan boards? Rasa Open Source	1	710	January 24, 2023
How to connect Healthcare dataset to rasa chatbot and train Rasa Open Source	4	979	April 8, 2021
Training from chat data Rasa Open Source	1	290	August 3, 2020
Customize chatbot for deperessed persons Rasa Open Source	0	94	April 29, 2024

Where can I find nlu datasets related to mental health for bot training?

Related topics