No Difference in Performance when Using or Changing Language Model Featurizers

The Problem

I tried to compare these two configs, using the same training data and testing data. I am training NLU only (rasa train nlu command). I would expect there to be a difference in performance… but creating token embeddings using either roberta or bert results in a trained CRF component with the same performance on the test set (down to the 10th decimal). Why is that? Are the token embeddings not being passed to the CRFEntityExtractor?

Thanks for any insight.

Config 1:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
    # Flag to check whether to split intents
  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "bert"
    # Pre-Trained weights to be loaded
    model_weights: "bert-base-uncased"
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2" ],
      # features for the word being evaluated
      [ "BOS", "EOS" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2" ]]
  - name: CRFEntityExtractor

Config 2:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
    # Flag to check whether to split intents
  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "roberta"
    # Pre-Trained weights to be loaded
    model_weights: "roberta-base"
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2" ],
      # features for the word being evaluated
      [ "BOS", "EOS" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2" ]]
  - name: CRFEntityExtractor

Update: Also no difference when omitting either LMF or LexSynFeat!

Config 3

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
    # Flag to check whether to split intents
  - name: LanguageModelTokenizer
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2" ],
      # features for the word being evaluated
      [ "BOS", "EOS" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2" ]]
  - name: CRFEntityExtractor

Config 4

  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "roberta"
    # Pre-Trained weights to be loaded
    model_weights: "roberta-base"
  - name: CRFEntityExtractor

I get the exact same results when testing the models with their specified config on the same testing data and looking at the entity extraction report! What am I doing wrong?

This appears to be the same issue that is happening here… CRF with dense features

Why is that??

I think I’ve found the issue. It’s explained in more detail on the GitHub issue here.