Hello,
I’m building a multi-languages chatbot, using different models for different languages. I’m using pretrained_embeddings_spacy for English, and supervised_embeddings for other languages.
I have seen that when recognizing an entity, there is a difference between the 2:
-
The pretrained_embeddings_spacy can recognize both uppercase and lowercase entities even when you only define the lowercase entities in training data. Furthermore, it automatically converts uppercase entities in the user’s message to lowercase.
-
The supervised_embeddings however can only recognize uppercase or lowercase entities depends on how you define them in the training data. It won’t detect lowercase entities if there is none of them in the training data (and vice-versa), you have to define them both. Furthermore, it keeps the format of the entities after recognizing them.
What is the cause of this difference ? If i’m not mistaking, they use the same CRFEntityExtractor. Is there a way to make the supervised_embeddings behave the same as pretrained_embeddings_spacy when recognizing entities ? It would be a little more convenient for me.