When supervised_embeddings (which uses CRFEntityExtractor) is used in the config (without any additional option) and evaluated, how are the performances (precision, recall, f1 and accuracy) computed? I am asking this because I found two contradictory info.
According to https://rasa.com/docs/rasa/user-guide/evaluating-models/, the entity scoring seems to be based on a simple tag-based approach which splits multiword tokens and evaluate the performances per each individual tokens. However, when I look at the code (rasa/crf_entity_extractor.py at master · RasaHQ/rasa · GitHub), the default setting of supervised_embeddings sets BILOU_flag as True. When BILOU_flag is true, the extracted entities are not individual tokens but a full sequence of them (rasa/crf_entity_extractor.py at ab382d049471c8f8468547f6f69f3a11a76600aa · RasaHQ/rasa · GitHub).
Which one is right? When the entity extraction is evaluated in the default supervised-embedding, is each entity an individual token or the whole BIL sequence?