Multiple NER

Hi

For some reasons, I seem to remember the older version of Rasa NLU allowed the use of multiple NER engines in the NLU pipeline. If a NE has been recognized by multiple NER engines, then the one with the highest certainty value will be retained. But I can’t find any information on this (and I am actually not sure if I have actually read about this before either…).

Anyway, long story short, I would be interested in incorporating multiple NER components (along with training my own TF) in my pipeline since each one of them brings something unique to the table. But the current documentation makes no mention of doing this or if that’s permissible or advised…

And if it is supported, any particular concern about how to structure the pipeline?

Thanks in advance for your input!

Best, H

You are free to use multiple NER components in your pipeline. Your final entity map will contain all the entities extracted by different extractors.

In my experience if i had CRF and Spacy, only CRF would appear.

I used CRF and Spacy together, I was able to run successfully

Do you get spacy results and CRF results separately for extracted entities?

If I’ve annotated my example sentences in nlu with both of them, then yes. I get them separately.

Can you give an example of the annotation and the output of the NLU parsing command?

book an appointment for [thursday](DATE) at [8:20am](booking_time) DATE is from spacy and booking_time is my CRF.

Currently, this is not on my system but it definitely worked. I was getting both the extracted entities. Let me see if I can find that and I’ll get back.

Ah no you are write - i found earlier but forgot to reply here. I think previously i didn’t have proper components or something loaded, now i get spacy and CRF ner side by side :smiley: Thanks for replies!

Hi Srikar, yes, pls share the annotation. I would love to have a look to see how you did it.

Thx! SH

This is my config file

language: "en"
pipeline:
  - name: "nlp_spacy"
  - name: "tokenizer_spacy"
  - name: "intent_entity_featurizer_regex"
  - name: "intent_featurizer_spacy"
  - name: "ner_crf"
  - name: "ner_spacy"
  - name: "ner_synonyms"
  - name: "intent_classifier_sklearn"

My nlu data is something like this:

- the boy has a stick of [height](vit) [20](vit_value) [cms](units)
- girl is has a bag of [weight](vit) [345](vit_value) [kgs](units)

So the numbers like 20, 345 are extracted by crf as well as spacy. I get both of these in my extracted entities. Similarly, for other entities like data, time etc