Rasa NLU is a powerful and open-source natural language processing tool for intent classification and entity extraction in chatbots.
However, we found that there is no published public dataset and the corresponding benchmark. This makes it difficult to evaluate the performance of our own NLU system built by Rasa.
Therefore, we do a project aims to collect and organize datasets and baselines for Task-Oriented Dialogue, which will be in the data format required by Rasa NLU and you can directly use them in your Rasa NLU system.
I’m interested the see the supervised embeddings achieving a fairly high accuracy with low amounts of data (on AskUbuntuCorpus) - actually, I’d request you run these datasets under the NLU model comparison script and report on how well the models perform on these datasets with different occlusions! The script would make some informative graphs for the repository as well.
Thanks,I will add experiments and report how well the models perform on these datasets with different occlusions!
And you mean, this work can contribute to rasa rep ?
Well it’s not a direct contribution to the Github repo, but we consider it a contribution since you put in the work to help other Rasa Community members we love to see these kinds of projects.
Hello i am trying to do some benchmark of my dataset, but i do not know how to set the number of data that rasa uses for the benchmark. i did a split of my data. thanks