Evaluation results on entity extraction

onue5 · August 28, 2019, 1:31am

When supervised_embeddings (which uses CRFEntityExtractor) is used in the config (without any additional option) and evaluated, how are the performances (precision, recall, f1 and accuracy) computed? I am asking this because I found two contradictory info.

According to https://rasa.com/docs/rasa/user-guide/evaluating-models/, the entity scoring seems to be based on a simple tag-based approach which splits multiword tokens and evaluate the performances per each individual tokens. However, when I look at the code (rasa/crf_entity_extractor.py at master · RasaHQ/rasa · GitHub), the default setting of supervised_embeddings sets BILOU_flag as True. When BILOU_flag is true, the extracted entities are not individual tokens but a full sequence of them (rasa/crf_entity_extractor.py at ab382d049471c8f8468547f6f69f3a11a76600aa · RasaHQ/rasa · GitHub).

Which one is right? When the entity extraction is evaluated in the default supervised-embedding, is each entity an individual token or the whole BIL sequence?

Tanja · August 29, 2019, 6:40am

Hi @onue5, we are using a the first approach you mentioned:

According to Evaluating Models, the entity scoring seems to be based on a simple tag-based approach which splits multiword tokens and evaluate the performances per each individual tokens.

The BILOU flag is just used to determine the tagging data format. It does not influence the evaluation. The method you menitioned (https://github.com/RasaHQ/rasa/blob/ab382d049471c8f8468547f6f69f3a11a76600aa/rasa/nlu/extractors/crf_entity_extractor.py#L310) just converts the BILOU tag schema into a simple schema. For example, it removes B- from an entity label.

onue5 · August 29, 2019, 4:14pm

@Tanja, Thanks for the answer.

To clarify, suppose that the tagging is [‘O’, ‘B-per’, ‘I-per’, ‘L-per’, ‘O’]. Would the corresponding entities (considered during the evaluation) be {“entity”: per, “start”: 1, “end”: 3} where “entity”, “start”, “end” represent the entity label, the start index and the ending index respectively? Or, would it be {“entity”: per, “start”: 1, “end”: 1}, {“entity”: per, “start”: 2, “end”: 2}, {“entity”: per, “start”: 3, “end”: 3}?

I looked at the code again (rasa/crf_entity_extractor.py at ab382d049471c8f8468547f6f69f3a11a76600aa · RasaHQ/rasa · GitHub). The method calls self._handle_bilou_label which then calls self._find_bilou_end (rasa/crf_entity_extractor.py at ab382d049471c8f8468547f6f69f3a11a76600aa · RasaHQ/rasa · GitHub). This method seems to find the distant ending index when a multiword entity is tagged. If this is true, aren’t the whole sequences of the multiword entities considered during the evaluation?

Can you point me to the part where the multiword entities are split and the individual tokens are considered during the evaluation?

Tanja · September 9, 2019, 11:28am

Let’s look at an example:

(Rasa Technologies GmbH)[company] is based in (Berlin)[location].

This would be equally to

{
   "entities": [
    {
      "start": 0,
      "end": 21,
      "entity": "company",
      "value": "Rasa Technologies GmbH",
    },
    {
      "start": 34,
      "end": 40,
      "entity": "location",
      "value": "Berlin",
    }
  ]
}

During evaluation we convert the above sentence to

["company", "company", "company", "no-entity", "no-entity", "no-entity", "location"]

Same is done for the predictions. So you have an array of gold labels and predicted labels. We use the evaluation metrics from sklearn to obtain the f-score, precision, recall, etc. (see for example sklearn.metrics.classification_report — scikit-learn 0.19.2 documentation).

Regarding your questions:

The first one is correct {“entity”: per, “start”: 1, “end”: 3}, B, I, and L would be merged together.
Not 100% sure what you mean. We convert the BILOU format to the json format you can see above. During evaluation we use the array you see above. The entity is split to token again. Does that clarify your question?
Take a look at rasa/test.py at master · RasaHQ/rasa · GitHub

Topic		Replies	Views
Rasa NLU Supervised Embeddings Pipeline entity issue Rasa Open Source	2	1616	February 5, 2020
[Solved] Evaluation of entity extraction Rasa Open Source	3	1069	October 15, 2018
Can we test or evaluate entity extraction in Rasa NLU? Rasa Open Source	0	619	January 21, 2019
Entities with roles not validated correctly during test/cross validation Contributing Code entity	4	772	September 23, 2020
In evaluate.py is there a way to capture successes for Entities Rasa Open Source	3	456	March 19, 2019

Evaluation results on entity extraction

Related topics