Where can I find nlu datasets related to mental health for bot training?

Currently, I’m working on mental health assistance, where users will share their thought and I’ve to identify the type of condition according to the input. Where can I find such data sets for the training part


Is that repo suitable for you project purpose?

Thanks, @artemsnegirev for your help. Let me check the repo

1 Like

My idea is to build a chatbot which behaves like a therapist/psychiatrist and helps the person ease a little, in a simple conversational manner.

Yes, I’m trying to achieve something similar to this. For open-ended conversations with people, the bot should classify intent correctly, only after that, we will be able to write interactive stories to guide users to different paths. I’m looking for interactions between therapists/psychiatrists/psychologists and users to train the bot to understand different intents (different conditions). @Jasmine69 do you have records of such conversations?

Found a solution to gather data related to different mental health conditions by scrapping Reddit using python script, it’s a bit slow and involves some manual work but works for me :innocent:

Sounds interesting, could you share tread here?

The first step is to create an app to get API access to Reddit subreddits,

Once created make note of the personal use script and the secret token will use these while making API requests.

after that use the following script to get subreddits according to the name.

import requests
auth = requests.auth.HTTPBasicAuth('<your_personal_use_script>', '<your_secret_token>')
data = {'grant_type': 'password',
        'username': '<your_username>',
        'password': '<your_password>'}
headers = {'User-Agent': 'your_app_name/0.0.1'}
res = requests.post('https://www.reddit.com/api/v1/access_token', auth=auth, data=data, headers=headers)
TOKEN = res.json()['access_token']
headers = {**headers, **{'Authorization': f"bearer {TOKEN}"}}
res = requests.get("https://oauth.reddit.com/r/EatingDisorders/hot",
headers=headers, params={'limit':'100'})
for post in res.json()['data']['children']:
    print( post['data']['title'])

Currently, I’m printing only titles of the subreddits, more data can be found in the result returned from the API call. We can look for subreddits manually first, after that use that tag in the URL to get the top 100 threads. Sharing the original blog post from where I took references.


Hope It helps.

1 Like