Rasa-nlu-benchmark: Collection of dataset and corresponding benchmark for Rasa NLU

Project link : GitHub - nghuyong/rasa-nlu-benchmark: Collection of dataset and corresponding benchmark for Rasa NLU

Rasa NLU is a powerful and open-source natural language processing tool for intent classification and entity extraction in chatbots.

However, we found that there is no published public dataset and the corresponding benchmark. This makes it difficult to evaluate the performance of our own NLU system built by Rasa.

Therefore, we do a project aims to collect and organize datasets and baselines for Task-Oriented Dialogue, which will be in the data format required by Rasa NLU and you can directly use them in your Rasa NLU system.

Welcome to star and contribute together~

1 Like

Wow! Amazing work @nghuyong!

I’m interested the see the supervised embeddings achieving a fairly high accuracy with low amounts of data (on AskUbuntuCorpus) - actually, I’d request you run these datasets under the NLU model comparison script and report on how well the models perform on these datasets with different occlusions! The script would make some informative graphs for the repository as well.

I hope you know about the Rasa contributor program too. I believe this warrants a reward :slight_smile:

Thanks,I will add experiments and report how well the models perform on these datasets with different occlusions! And you mean, this work can contribute to rasa rep ?

Well it’s not a direct contribution to the Github repo, but we consider it a contribution since you put in the work to help other Rasa Community members :slight_smile: we love to see these kinds of projects.

We have added the Comparing NLU Pipelines experiments~

Hello i am trying to do some benchmark of my dataset, but i do not know how to set the number of data that rasa uses for the benchmark. i did a split of my data. thanks