No Difference in Performance when Using or Changing Language Model Featurizers

liaeh · June 21, 2021, 2:42pm

The Problem

I tried to compare these two configs, using the same training data and testing data. I am training NLU only (rasa train nlu command). I would expect there to be a difference in performance… but creating token embeddings using either roberta or bert results in a trained CRF component with the same performance on the test set (down to the 10th decimal). Why is that? Are the token embeddings not being passed to the CRFEntityExtractor?

Thanks for any insight.

Config 1:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
    # Flag to check whether to split intents
  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "bert"
    # Pre-Trained weights to be loaded
    model_weights: "bert-base-uncased"
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2" ],
      # features for the word being evaluated
      [ "BOS", "EOS" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2" ]]
  - name: CRFEntityExtractor

Config 2:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
    # Flag to check whether to split intents
  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "roberta"
    # Pre-Trained weights to be loaded
    model_weights: "roberta-base"
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2" ],
      # features for the word being evaluated
      [ "BOS", "EOS" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2" ]]
  - name: CRFEntityExtractor

Update: Also no difference when omitting either LMF or LexSynFeat!

Config 3

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
    # Flag to check whether to split intents
  - name: LanguageModelTokenizer
  - name: LexicalSyntacticFeaturizer
    "features": [
      # features for the word preceding the word being evaluated
      [ "suffix2", "prefix2" ],
      # features for the word being evaluated
      [ "BOS", "EOS" ],
      # features for the word following the word being evaluated
      [ "suffix2", "prefix2" ]]
  - name: CRFEntityExtractor

Config 4

  - name: LanguageModelFeaturizer
    # Name of the language model to use
    # choose from ['bert', 'gpt', 'gpt2', 'xlnet', 'distilbert', 'roberta']
    # or create a new class inheriting from this class to support your model.
    model_name: "roberta"
    # Pre-Trained weights to be loaded
    model_weights: "roberta-base"
  - name: CRFEntityExtractor

I get the exact same results when testing the models with their specified config on the same testing data and looking at the entity extraction report! What am I doing wrong?

liaeh · June 23, 2021, 1:45pm

This appears to be the same issue that is happening here… CRF with dense features

Why is that??

koaning · June 25, 2021, 9:33am

I think I’ve found the issue. It’s explained in more detail on the GitHub issue here.

helehh · January 17, 2022, 10:01am

Hi! The issue was fixed by updating the documentation about how to use dense features with the CRFEntityExtractor, however, this update has now disappeared from the documentation. Should I still specify text_dense_feature explicitly?

Topic		Replies	Views
CRF with dense features Rasa Open Source	23	1846	June 28, 2021
LanguageModelFeaturizer in pipeline dont work! Rasa Open Source	2	687	May 12, 2022
Crf_entity_extractor with ner_features Rasa Open Source	1	607	February 25, 2020
I need a Albert in LanguageModelFeature Rasa Open Source	16	1615	January 3, 2022
Support for Language Models inside Rasa Release Announcements community , rasa	25	12768	November 25, 2021