Using NER as a Feature for CRFEntityExtractor

Hi all, I’ve been playing around with Rasa (NLU-only) and am wondering how to achieve this –

I’m working on a slot-filling task, in a ‘low-data’ (i.e. 20 samples per label) setting. I want to train a custom entity extractor, and have decided to use CRFEntityExtractor (because I don’t have so much data for DIET).

Think of an utterance such as "I want to depart from New York" with NY = departure as slot label. My idea is to use a pre-trained NER extractor, e.g. from SpaCy to first extract New York as a city. Then, combine it with token embeddings, e.g. LanguageFeaturizer component to use a Transformer model to create contextual embeddings, and use both the entity label and token embeddings as features to train the CRF tagger.

My questions:

  1. How could I combine an entity prediction from the pretrained NER and use it as a feature for the CRF?
  2. Would this be the correct config:
pipeline:
  - name: SpacyNLP
    model: en_core_web_trf
    case_sensitive: False
  - name: SpacyTokenizer
  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "bert"
    # Pre-Trained weights to be loaded
    model_weights: "bert-base-uncased"
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2", "pos2" ],
      # features for the word being evaluated
      [ "BOS", "EOS", "pos2" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2", "pos2" ]]
  - name: CRFEntityExtractor

Thanks for any help :blush:

At the time of writing this answer, the entities extracted by entity extractors are not stored as features for the machine learning pipeline.

It’s explained in full detail in this NLU guide.

That said, what entities are you trying to detect? Cities? If that’s the case it might be more pragmatic to start with a name-list or spaCy before considering BERT features.

Thanks for the response!

I am using RasaNLU to run several benchmarking experiments on various datasets for slot filling. So I don’t have an entity in particular in mind. I just thought this would be a useful feature to use pre-trained names entities as features for a custom entity extractor.

For example, I want to fly to New York from Chicago , New York being a city could be a useful feature to classify it as a departure slot.

I see now that spaCy is probably a good option for this, as it has several pre-trained NER extractors that can be used as features. I’ve had success implementing this with sklearn-crf suite, but haven’t tried with Rasa. I guess this would have to be some custom logic I add in.

The CRF implementation inside of Rasa is based on sklearn-crf.

I’m actually reminded now that, technically (very technically), the DIET algorithm uses a CRF layer from Tensorflow. One interpretation of this layer is that it uses the entity information from the neighboring tokens to predict if the current token is an entity.

In case you’re interested, you may find this youtube video I made on name detection interesting. It tries to highlight how hard it can be to properly benchmark for certain entities. Human names, in particular, turn out to be much harder than I thought they would be.

1 Like

True, DIET does use a CRF head for decoding the sequence. However, as I understand it, it is looking at the previous predicted custom entities when decoding a given entity; it would not consider e.g. information from a pretrained NER unless this information is explicitly passed as a feature, right?

True. The pre-trained entities, for example from spaCy, are not passed as features.