Handling multiple word entity

ashek1520 · October 13, 2021, 2:09pm

In domains such as medical, there are many entities that are composed of multiple tokens. In such entities, the meaning of individual tokens can be very different than the whole entity (multiple words).

When I run NLU, rasa recognizes a few of such entities, but in most cases, it splits the entity into multiple single token entities. Example: “Brake Pad” is splitter to “Brake” and “Pad”.

One of the reasons I think could be related to the word vector model, as the model does not have any vector for the whole entity. the model creates a vector model in such a case using subword level embedded vectors. As these vectors do not have much context-related information, the NER fails to do well.

I am thinking of creating a new vector model for these types of entities, such that multiple word entities have better vector representation.

My question is if we have a new vector model, do I need to do something else in Rasa to handle multiple word tokens. My thoughts (please correct me if you have another suggestion or comment):

I think I will have to write a component to handle the new vector model.
Do I need to do anything else to ad DIET classifier?

Thank you, Abhishek

siriusraja · October 15, 2021, 7:40am

Hi @ashek1520

Are you using both CRF & DIET for entity extraction? If you are using DIET, ensure you have enough training examples with multiple word entities and then increase the epochs in the config file.

DIET requires more training examples and more epochs in this case.

ashek1520 · November 9, 2021, 3:42pm

Yes I am using DIET and CRF, Ok, I will try to add more examples for DIET to work with multiple word entities. Does the accuracy depend on what language model (word vector model) we are using and if the word vector model has vector for such ‘multiple word entities’?

Topic		Replies	Views
Multiple word entity detected as more entities Welcome to the Rasa Community Forum!	0	642	October 27, 2021
Entity being extracted by multiple entity extractors breaks testing Rasa Open Source	3	555	July 19, 2021
Rasa test nlu: test if all entities are labeled chorrectly within a sentence Rasa Open Source testing	2	782	February 15, 2021
Facing issue in identifying multiple Entity Extraction ( Model lacks generalization) using Rasa/LaBSE Rasa Open Source	2	429	October 29, 2021
CRFEntityExtractor or DIETClassifier splits one entity into multiple words Rasa Open Source	3	656	October 5, 2020

Handling multiple word entity

Related topics