Ner_crf extractor separating entity values that contain "-"

I have an entity, and one of the values is supposed to be something like “red-blue”, but when I look at the debug mode, I see that it has extracted three different values for the same entity. It says the three values are “red”, “-”, and “blue”. Ultimately, the only value stored in my slot is “blue” since it is the last entity value to be extracted. Is there any way to edit the policy of splitting on the “-” punctuation only?

Any help would be very much appreciated. Thanks

Which version are you using?

In my case, if an entity contains hyphen, it extracts the entity value but adds a space. For example, if red-blue, it extracts as red - blue.

Check the _create_entity_dict() function in the crf_entity_extractor.py file.

1 Like