For big nlu files I need a way to check which entities are present in which intents and how often they occur. Ideally there is also a way to see which words are mapped to the entities. I didn’t find any solutions for this.
Do you now any tools / scripts that I can use for these questions? Or do I have to write them from scratch?
I want to do this against my training data. I want to compare different approaches of labeling the data and see which performs the best. But to make sure that my data actually follows the intended approach I need a way to check the data and catch any data that was wrongly labeled.
I wrote doubtlab which is a general tool, somewhat more aimed at the general scikit-learn ecosystem, which can help you find bad labels in your dataset. This may help with intents, entities not so much at the moment. You can also check out cleanlab for this task.