Different behavior when recognizing entities between pretrained_embeddings_spacy and supervised embeddings pipeline

fuih · October 30, 2019, 4:06am

Hello,

I’m building a multi-languages chatbot, using different models for different languages. I’m using pretrained_embeddings_spacy for English, and supervised_embeddings for other languages.

I have seen that when recognizing an entity, there is a difference between the 2:

The pretrained_embeddings_spacy can recognize both uppercase and lowercase entities even when you only define the lowercase entities in training data. Furthermore, it automatically converts uppercase entities in the user’s message to lowercase.
The supervised_embeddings however can only recognize uppercase or lowercase entities depends on how you define them in the training data. It won’t detect lowercase entities if there is none of them in the training data (and vice-versa), you have to define them both. Furthermore, it keeps the format of the entities after recognizing them.

What is the cause of this difference ? If i’m not mistaking, they use the same CRFEntityExtractor. Is there a way to make the supervised_embeddings behave the same as pretrained_embeddings_spacy when recognizing entities ? It would be a little more convenient for me.

IgNoRaNt23 · October 30, 2019, 6:18am

The pretrained embeddings use SpacyNLP (as the name says), which is set case insensitive by default so everything is set to lowercase before anything else happens. You may add

case_sensitive: true

after “SpacyNLP” to get the same behaviour.

See Components

fuih · October 30, 2019, 6:30am

Thank you @IgNoRaNt23, i want to make the supervised_embeddings case insensitive though, at least for now, because it would be convenient to recognize the entity whether it is uppercase or not (except for human name entity i guess). Can i set the case_sensitive of supervised_embeddings pipeline to false ?

IgNoRaNt23 · October 30, 2019, 6:34am

Im not sure, but you can try. If not, you could write your own custom component that sets the message to lowercase. Also not sure if you should set all your training to lower case if you try that. But its probably not hard to do, so just try.

Topic		Replies	Views
Enity extraction is case sensitive Rasa Open Source	3	755	June 16, 2020
Case Insensitive for entity extraction Rasa Open Source	5	1504	March 20, 2020
Spacy with supervised_embeddings pipeline Rasa Open Source	0	344	February 18, 2020
Rasa NLU Supervised Embeddings Pipeline entity issue Rasa Open Source	2	1616	February 5, 2020
Entities with lowercase are not recognized Rasa Open Source	1	789	October 14, 2021

Different behavior when recognizing entities between pretrained_embeddings_spacy and supervised embeddings pipeline

Related topics