RASA and web scraping

forwitai · June 25, 2020, 9:27am

Hello everyone, I’m completely new to Rasa and trying my best to learn how to use it.

Currently, I’m working on an academic project, I’m trying to build a Chatbot for the university’s website that would answer any question related to the university with answers retrieved from the website itself. So I figured out that the way to achieve this would be to use web scraping. Again I’m completely new to Rasa and I don’t have a clear idea on how to achieve this, I’ve already managed to train my bot on several questions using the classic method by providing the answers directly in the responses.md file. Now I would like to be able to answer any kind of questions, so how can I identify what the user is asking about and how to look for an answer in the website ? I know I should use a custom action, but am I supposed to cover every possible question in the NLU file or would it be done automatically ? And how to map the question to the scraped answer and make sure it’s the right one ?

Thanks in advance.

amn41 · June 25, 2020, 12:59pm

hi @forwitai - welcome to the forum and cool that you’re learning Rasa!

I would strongly advise that you start by covering just a few FAQs (just use your best guess of what people will ask) and then get some test users to try out the bot (maybe you can show it to a limited number of website viewers, for example).

Users will always surpise you with what they ask, and this will help you focus on answering the questions that actually come up. I think you could easily waste a lot of time on the scraping implementation without actually helping users.

forwitai · June 25, 2020, 6:39pm

Hello Alan, thank you for the quick reply ! I’ve already managed to cover few obvious questions, but the website contains way more. I’m intending to develop a Chatbot that would answer any question of the visitors from the website and save them time, and that’s why I thought about web scraping. I’m thinking about using it to build a CSV file for example that would have two columns, intent and answer, use RASA NLU and then a custom action that would lookup the csv file for an answer according to the detected intent. The FAQ part is a first step, I will sure have contextual conversations where I will need RASA core but not for now. Can you please tell me what do you think about this approach ? Any advice is more than welcome. Thanks a lot !

finesto · April 5, 2021, 12:19am

forwitai:

Hello everyone, I’m completely new to Rasa and trying my best to learn how to use it.

Currently, I’m working on an academic project, I’m trying to build a Chatbot for the university’s website that would answer any question related to the university with answers retrieved from the website itself. So I figured out that the way to achieve this would be to use web scraping. Again I’m completely new to Rasa and I don’t have a clear idea on how to achieve this, I’ve already managed to train my bot on several questions using the classic method by providing the answers directly in the responses.md file. Now I would like to be able to answer any kind of questions, so how can I identify what the user is asking about and how to look for an answer in the website ? I know I should use a custom action, but am I supposed to cover every possible question in the NLU file or would it be done automatically ? And how to map the question to the scraped answer and make sure it’s the right one ?

Thanks in advance.

Hello! I’m currently developing a chatbot for a university website as a project too! This post really aligns with my needs.

I’ve also been trying to figure out a way to dynamically fetch content on a website that the chatbot could spit out easily. I’ve been looking at web scrapers, databases, and even json files (just putting it all in there).

Seeing as this post is already a year old, I was wondering whether you had any updates or links that you used that could really help me (perhaps taking a peak at your chatbot source code).

Thank you.

pkchoudhary1211 · June 24, 2025, 2:23pm

I’m working on a similar chatbot for my uni’s site, and honestly, getting answers straight from the website in real time has been the biggest headache. I tried going with a big CSV at first, but keeping it updated and matching user questions was more complicated than I expected, especially once the content on the site changes. I started exploring web scraping as a way to keep my data fresh, and integrating it with Rasa custom actions was a pretty fun challenge, though mapping intents to answers automatically is still a work in progress for me.

I have found that using an API-based crawler beats building scrapers from scratch, especially since I don’t have to worry as much about being blocked or having to deal with captcha errors all the time. If you want to look into this route as well, I stumbled on options like https://crawlbase.com/ that save a lot of setup time and could help keep your knowledge base up to date for less manual work.

Topic		Replies	Views
Web scrapping or create a DB ,,opinion from professional perspectives needed Rasa Open Source	21	1423	July 19, 2023
Chatbot for a website Rasa Open Source	2	334	February 18, 2020
Rasa integration with website Rasa Open Source	14	1084	June 21, 2022
How rasa bot searching in a website for user query? Rasa Open Source	8	1618	June 28, 2022
How to build simple web bot chat Getting Started with Rasa conversation	1	249	November 15, 2018

RASA and web scraping

Related topics