Hi. Yes - that seems somewhat cheeky. Here they are treating non-entities as an entity named “no_entity”. Recall = True Positives / (True positives + False Negatives). Precision = True Positives / (True Positives + False Positives). F1-Score = 2x Precision x Recall / (Precision + Recall). Notice here that we never see any of the calculations using True Negatives. (this is the number of times the classifier correctly identifies a non-entity as not being any entity). This means that this doesn’t usually contribute to the F1-score. However, by creating a label called ‘no_entity’ , they can now count it toward the True Positives score - which does impact the F1-score. It isnt so bad if its just to work out the precision, recall and f1-score of “no_entity” on its own. However, to include it in the average - skews the results somewhat. I’d say - take it out and recalculate the averages. You’d get a more accurate representation of the NER’s performance.