When supervised_embeddings (which uses CRFEntityExtractor) is used in the config (without any additional option) and evaluated, how are the performances (precision, recall, f1 and accuracy) computed? I am asking this because I found two contradictory info.
Which one is right? When the entity extraction is evaluated in the default supervised-embedding, is each entity an individual token or the whole BIL sequence?
Hi @onue5, we are using a the first approach you mentioned:
According to Evaluating Models, the entity scoring seems to be based on a simple tag-based approach which splits multiword tokens and evaluate the performances per each individual tokens.
To clarify, suppose that the tagging is [‘O’, ‘B-per’, ‘I-per’, ‘L-per’, ‘O’]. Would the corresponding entities (considered during the evaluation) be {“entity”: per, “start”: 1, “end”: 3} where “entity”, “start”, “end” represent the entity label, the start index and the ending index respectively? Or, would it be {“entity”: per, “start”: 1, “end”: 1}, {“entity”: per, “start”: 2, “end”: 2}, {“entity”: per, “start”: 3, “end”: 3}?
The first one is correct {“entity”: per, “start”: 1, “end”: 3}, B, I, and L would be merged together.
Not 100% sure what you mean. We convert the BILOU format to the json format you can see above. During evaluation we use the array you see above. The entity is split to token again. Does that clarify your question?