Analyse intent / entity distribution

Hello everyone! For big nlu files I need a way to check which entities are present in which intents and how often they occur. Ideally there is also a way to see which words are mapped to the entities. I didn’t find any solutions for this.

Do you now any tools / scripts that I can use for these questions? Or do I have to write them from scratch?

Best regards and thank you for your help!

1 Like

Hi Mike,

Do you want to do this against your training data or user conversations?

Greg

1 Like

Hi Greg,

I want to do this against my training data. I want to compare different approaches of labeling the data and see which performs the best. But to make sure that my data actually follows the intended approach I need a way to check the data and catch any data that was wrongly labeled.

Mike

1 Like

Hi @anyone. I’m interested in doing this exact thing. Any updates on this? Thanks.

@stephens does rasa provide functionality for this? Thanks.

I want to compare different approaches of labeling the data and see which performs the best

For this part of your question, you can use rasa test nlu.

I need a way to check which entities are present in which intents and how often they occur.

I don’t know of an existing solution for this. Maybe one of @koaning’s projects?

I need a way to check the data and catch any data that was wrongly labeled.

I don’t think rasa data validate checks for conflicting entity labels but this would be useful. Might want to submit an enhancement request for this.

I wrote doubtlab which is a general tool, somewhat more aimed at the general scikit-learn ecosystem, which can help you find bad labels in your dataset. This may help with intents, entities not so much at the moment. You can also check out cleanlab for this task.