Hi all, I’ve been playing around with Rasa (NLU-only) and am wondering how to achieve this –
I’m working on a slot-filling task, in a ‘low-data’ (i.e. 20 samples per label) setting. I want to train a custom entity extractor, and have decided to use CRFEntityExtractor (because I don’t have so much data for DIET).
Think of an utterance such as "I want to depart from New York" with NY = departure as slot label. My idea is to use a pre-trained NER extractor, e.g. from SpaCy to first extract New York as a city. Then, combine it with token embeddings, e.g. LanguageFeaturizer component to use a Transformer model to create contextual embeddings, and use both the entity label and token embeddings as features to train the CRF tagger.
My questions:
How could I combine an entity prediction from the pretrained NER and use it as a feature for the CRF?
Would this be the correct config:
pipeline:
- name: SpacyNLP
model: en_core_web_trf
case_sensitive: False
- name: SpacyTokenizer
- name: LanguageModelFeaturizer
# Name of the language model to use
# choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
# or create a new class inheriting from this class to support your model.
model_name: "bert"
# Pre-Trained weights to be loaded
model_weights: "bert-base-uncased"
- name: LexicalSyntacticFeaturizer
"features": [
# features for the word preceding the word being evaluated
[ "suffix2", "prefix2", "pos2" ],
# features for the word being evaluated
[ "BOS", "EOS", "pos2" ],
# features for the word following the word being evaluated
[ "suffix2", "prefix2", "pos2" ]]
- name: CRFEntityExtractor
That said, what entities are you trying to detect? Cities? If that’s the case it might be more pragmatic to start with a name-list or spaCy before considering BERT features.
I am using RasaNLU to run several benchmarking experiments on various datasets for slot filling. So I don’t have an entity in particular in mind. I just thought this would be a useful feature to use pre-trained names entities as features for a custom entity extractor.
For example, I want to fly to New York from Chicago , New York being a city could be a useful feature to classify it as a departure slot.
I see now that spaCy is probably a good option for this, as it has several pre-trained NER extractors that can be used as features. I’ve had success implementing this with sklearn-crf suite, but haven’t tried with Rasa. I guess this would have to be some custom logic I add in.
The CRF implementation inside of Rasa is based on sklearn-crf.
I’m actually reminded now that, technically (very technically), the DIET algorithm uses a CRF layer from Tensorflow. One interpretation of this layer is that it uses the entity information from the neighboring tokens to predict if the current token is an entity.
In case you’re interested, you may find this youtube video I made on name detection interesting. It tries to highlight how hard it can be to properly benchmark for certain entities. Human names, in particular, turn out to be much harder than I thought they would be.
True, DIET does use a CRF head for decoding the sequence. However, as I understand it, it is looking at the previous predicted custom entities when decoding a given entity; it would not consider e.g. information from a pretrained NER unless this information is explicitly passed as a feature, right?