Where can I find nlu datasets related to mental health for bot training?

Currently, I’m working on mental health assistance, where users will share their thought and I’ve to identify the type of condition according to the input. Where can I find such data sets for the training part

1 Like

Hello!

Is that repo suitable for you project purpose?

1 Like

Thanks, @artemsnegirev for your help. Let me check the repo

2 Likes

My idea is to build a chatbot which behaves like a therapist/psychiatrist and helps the person ease a little, in a simple conversational manner.

1 Like

Yes, I’m trying to achieve something similar to this. For open-ended conversations with people, the bot should classify intent correctly, only after that, we will be able to write interactive stories to guide users to different paths. I’m looking for interactions between therapists/psychiatrists/psychologists and users to train the bot to understand different intents (different conditions). @Jasmine69 do you have records of such conversations?

1 Like

Found a solution to gather data related to different mental health conditions by scrapping Reddit using python script, it’s a bit slow and involves some manual work but works for me :innocent:

1 Like

Sounds interesting, could you share tread here?

1 Like

The first step is to create an app to get API access to Reddit subreddits,

Once created make note of the personal use script and the secret token will use these while making API requests.

after that use the following script to get subreddits according to the name.

import requests
auth = requests.auth.HTTPBasicAuth('<your_personal_use_script>', '<your_secret_token>')
data = {'grant_type': 'password',
        'username': '<your_username>',
        'password': '<your_password>'}
headers = {'User-Agent': 'your_app_name/0.0.1'}
res = requests.post('https://www.reddit.com/api/v1/access_token', auth=auth, data=data, headers=headers)
TOKEN = res.json()['access_token']
headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}
res = requests.get("https://oauth.reddit.com/r/EatingDisorders/hot",
headers=headers, params={'limit':'100'})
for post in res.json()['data']['children']:
    print( post['data']['title'])

Currently, I’m printing only titles of the subreddits, more data can be found in the result returned from the API call. We can look for subreddits manually first, after that use that tag in the URL to get the top 100 threads. Sharing the original blog post from where I took references.

https://towardsdatascience.com/how-to-use-the-reddit-api-in-python-5e05ddfd1e5c

Hope It helps.

2 Likes

For mental health datasets, check out Kaggle and UCI Machine Learning Repository.