RASA and web scraping

Hello everyone, I’m completely new to Rasa and trying my best to learn how to use it.

Currently, I’m working on an academic project, I’m trying to build a Chatbot for the university’s website that would answer any question related to the university with answers retrieved from the website itself. So I figured out that the way to achieve this would be to use web scraping. Again I’m completely new to Rasa and I don’t have a clear idea on how to achieve this, I’ve already managed to train my bot on several questions using the classic method by providing the answers directly in the responses.md file. Now I would like to be able to answer any kind of questions, so how can I identify what the user is asking about and how to look for an answer in the website ? I know I should use a custom action, but am I supposed to cover every possible question in the NLU file or would it be done automatically ? And how to map the question to the scraped answer and make sure it’s the right one ?

Thanks in advance.

hi @forwitai - welcome to the forum and cool that you’re learning Rasa!

I would strongly advise that you start by covering just a few FAQs (just use your best guess of what people will ask) and then get some test users to try out the bot (maybe you can show it to a limited number of website viewers, for example).

Users will always surpise you with what they ask, and this will help you focus on answering the questions that actually come up. I think you could easily waste a lot of time on the scraping implementation without actually helping users.

1 Like

Hello Alan, thank you for the quick reply ! I’ve already managed to cover few obvious questions, but the website contains way more. I’m intending to develop a Chatbot that would answer any question of the visitors from the website and save them time, and that’s why I thought about web scraping. I’m thinking about using it to build a CSV file for example that would have two columns, intent and answer, use RASA NLU and then a custom action that would lookup the csv file for an answer according to the detected intent. The FAQ part is a first step, I will sure have contextual conversations where I will need RASA core but not for now. Can you please tell me what do you think about this approach ? Any advice is more than welcome. Thanks a lot !

Hello! I’m currently developing a chatbot for a university website as a project too! This post really aligns with my needs.

I’ve also been trying to figure out a way to dynamically fetch content on a website that the chatbot could spit out easily. I’ve been looking at web scrapers, databases, and even json files (just putting it all in there).

Seeing as this post is already a year old, I was wondering whether you had any updates or links that you used that could really help me (perhaps taking a peak at your chatbot source code).

Thank you.