Our Rasa chatbot should have some intents designated to querying and aggregating data from a database (for every table field and operation!).
Rasa doesn’t seem to be the right tool for that since it doesn’t has any ML based SQL generation out of the box. Writing a SQL generation layer manually is very tedious and error prone. We would like to avoid that.
There are many papers on this topic under keywords like seq2SQL, nl2SQL, Text-to-SQL (it seems to be a hot topic currently, last publication is just 5 days old).
We would like to use Rasa as a middleman and only a seq2SQL model in the backend (after Rasa). Rasa should keep the context and the seq2SQL model should provide the data for a given request (based on context).
We are not sure how to go about this. Did anybody in this community or at Rasa worked on a similar problem and likes to share?
What we can share on this so far is a list of papers in the field for which GitHub implementations exist:
This is a super interesting and ambitious project. In fact, we were working on something similar when we decided to start building Rasa!
Writing a SQL generation layer manually is very tedious and error prone
Yes, but training a neural net to do the same is also difficult remember that you will have to create a model for the specific schema you are working with. So I would strongly doubt you can take something off the shelf and work with it. Do you have data you can train on?
My advice would be to start simple. Define entities for the things you need (essentially the SQL verbs). Your training data would look something like:
[How many](aggregation) [purchases](item) were made [yesterday](filter:date)?
Show me the [total](aggregation) number of [customers](item) by [country](groupby)
You can then combine those entities to write a SQL query for your DB.
Once you have something (basic) that’s working, you can give it to people to test and generate more training data. I suspect you’ll need quite a lot in order to train one of the neural models
The way you recommended is how we are currently going about this. Until now this works great with our self-created training set.
The way of starting with a small set and basic functionality to later expand on this with user data is also how we planned it.
The problem that we see with that is that when requests become more complex (for example with multi table conditions … and field selections) building valid SQL queries from these entities can get very complicated. We expect a ton of special cases that all need to be programmed in.
Maybe plugging in an additional model is overkill but I am currently not aware of any python library that could bridge the way between entities and valid SQL queries.
How did your company solve this part? Did Rasa suffice on all your requirements in regards to this use case?
I’m afraid I don’t have a simple answer to this - it’s a hard problem to solve in the general case and definitely an opportunity for you to build something defensible!
Just to share our experience, we found that people never expressed very complicated queries in a single utterance. “Please show me all mexican restaurants within 3 miles with wifi but not in the tenderloin” is something you’re unlikely to see. It’s much more common to get something ambiguous (e.g. “I’m hungry”) and then having a conversation to figure out the details. Depends on your target users of course but that was our experience.
Hi guys! I’m working on a similar project and thought RASA will be better for me. I have seen a paid service kueri.me, Which offers to give a natural language interface to your database. I want to start with something basic then maybe go to harder queries. Did you guys find any new advancements in the field?
we now mainly use Rasa with training data that annotates and abstracts general database concepts like tables, field, …
A general select query with table and field as well as aggregations, number of requested entries, … works very well. Defining filters with free text are more hard. Currently we use Rasa Forms for that.
But what I can share is that we switched to ponyORM to talk to the SQL database. Since it works in a pythonic way and we don’t need to define SQL queries from intents and entities anymore it makes everything much more easy (not perfect).
From a chatbot user standpoint - no, an user can only request data for one subject (that is how we handle tables). But these tables can also be views.
Yes, that heavily limits the capabilities but as @amn41 stated, most queries are not that complex anyway.
Hi @JiteshGaikwad, Thanks for the reply but i wasn’t looking for this. I need a code which can convert English to SQL and retrieve me data from database.
hi @siddharthchauhan - translating English to SQL is a very difficult problem. If you look at my comment above, you can get an idea of how you can build a limited version of this for a particular database schema.
hi @malarsarav - these are good papers! but as I mentioned above, this is a research area. So if you are willing to put in the work, you can build something that others don’t have!
similar requirement. currently, we could query a single table using nl/voice. i am planing to transfer our app to Rasa in order not to using too many rules.