Setup data for entity group recognition

harloc · September 22, 2020, 1:04pm

Hello everyone,

Currently I am working on extracting entity types and groups from a corpus and the Rasa NLU seems to be a handy tool to solve the task. But as the feature of predicting entity groups is quite new, I did not find much information on this issue elsewhere. So I have two questions related to the topic:

When I label the data in markdown format, is it necessary to label every entity with a group or only those entities which belong together. For example: Variant A: I want to buy [red]{“entity”: “color”, “group”: “1”} [shoes]{“entity”: “clothing”, “group”: “1”}, [jeans]{“entity”: “clothing”, “group”: “2”} and a [hat]{“entity”: “clothing”, “group”: “3”}.

Variant B: I want to buy [red]{“entity”: “color”, “group”: “1”} [shoes]{“entity”: “clothing”, “group”: “1”}, [jeans]{“entity”: “clothing”} and a [hat]{“entity”: “clothing”}. So is variant A right or variant B or does it make no difference at all?

Also I do not need intent prediction for the task, I am working on. But it still seems DIET is the better choice for it than CRFentityExtractor, as it contains the additional transformer layers, which should improve performance, while the extracted features are directly passed to the CRFentityExtractor. Is that right? Will DIET automatically “understand”, that I do not need intent classification, if there are no intent labels given in the training data or do I have to specify that in the config.yml?

Thank you for all answers in advance.

Tanja · September 23, 2020, 9:18am

Hi @harloc, great to see that you are looking into this feature

Regarding your questions:

(1) It depends on what you actually want to do with the extracted entities later on, I would say. In variant A you have three separate groups whereas in variant B you would just now that red and shoes belong together, but it might be not clear how to process jeans and hat. So both annotations are valid, it just depends on what kind of information you actually need later on.

(2) It is not possible to define training data just for entities. So in your nlu.md file you need to define at least one dummy intent and add examples for your entities in there. As you mentioned, DIET is using a transformer model, whereas the CRFEntityExtractor is using a standard CRF model. The features used by the two models are different as the CRFEntityExtractor is creating its own features. If you want to learn more about the different options you have for the two models, see our docs. So far I have not seen any big performance difference between the two models when it comes to roles and groups. DIET might be a bit better, but it also needs longer to train as you most likely need to increase the number of epochs. If you have the time, you can simply run both models and see which works better for you.

Hope this helps. Let me know if you have any more questions.

Topic		Replies	Views
Intent Matching to be affected by Entity Extracted Rasa Open Source	14	1229	June 8, 2020
Same entity types occurring together in a text meeses up the entity and group prediction Rasa Open Source	1	331	June 7, 2021
Use DIETClassifier with custom rule-based entity extractor Rasa Open Source	5	1217	August 21, 2020
Entity Recognition for (Non-English) Language Rasa Open Source	2	1010	April 15, 2020
Analyse intent / entity distribution Rasa Open Source	6	305	November 10, 2021

Setup data for entity group recognition

Related topics