Hello everyone,
Currently I am working on extracting entity types and groups from a corpus and the Rasa NLU seems to be a handy tool to solve the task. But as the feature of predicting entity groups is quite new, I did not find much information on this issue elsewhere. So I have two questions related to the topic:
When I label the data in markdown format, is it necessary to label every entity with a group or only those entities which belong together. For example: Variant A: I want to buy [red]{“entity”: “color”, “group”: “1”} [shoes]{“entity”: “clothing”, “group”: “1”}, [jeans]{“entity”: “clothing”, “group”: “2”} and a [hat]{“entity”: “clothing”, “group”: “3”}.
Variant B: I want to buy [red]{“entity”: “color”, “group”: “1”} [shoes]{“entity”: “clothing”, “group”: “1”}, [jeans]{“entity”: “clothing”} and a [hat]{“entity”: “clothing”}. So is variant A right or variant B or does it make no difference at all?
Also I do not need intent prediction for the task, I am working on. But it still seems DIET is the better choice for it than CRFentityExtractor, as it contains the additional transformer layers, which should improve performance, while the extracted features are directly passed to the CRFentityExtractor. Is that right? Will DIET automatically “understand”, that I do not need intent classification, if there are no intent labels given in the training data or do I have to specify that in the config.yml?
Thank you for all answers in advance.