Custom Transformer-based Featurizer producing inconsistent outputs vs HuggingFace's version

lhr0909 · May 8, 2021, 10:09am

Hi,

I am following the lm_featurizer to implement a dense featurizer that uses PaddleNLP’s Transformer API PaddleNLP/transformers.md at develop · PaddlePaddle/PaddleNLP · GitHub

While the BERT models are the same (in terms of weights), and it seems like the methods are largely similar, I am getting unstable results (running the same input more than once against the NLU shell, and the confidence will fluctuate). This never happened on the HuggingFace’s LanguageModel version.

I am not an expert in the space (still trying to learn), so I am trying to get some ideas on how that’d happen. I think it largely depends on the integration method I use (I integrated all the way till where the model is run against the tokens and attention masks to generate sequence embedding). I am happy to provide the code repo so someone can help diagnose and see what is the delta needed to make the code more consistent.

github.com

lhr0909/rasa-chinese-paddlenlp/blob/main/nlu/paddlenlp_featurizer.py

import numpy as np
import logging

from typing import Any, Optional, Text, List, Type, Dict, Tuple

import paddle
import rasa.core.utils
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.components import Component, UnsupportedLanguageError
from rasa.nlu.featurizers.featurizer import DenseFeaturizer
from rasa.nlu.model import Metadata
import rasa.shared.utils.io
from rasa.shared.nlu.training_data.features import Features
from rasa.nlu.tokenizers.tokenizer import Tokenizer, Token
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.shared.nlu.training_data.message import Message
from rasa.nlu.constants import (
    DENSE_FEATURIZABLE_ATTRIBUTES,
    SEQUENCE_FEATURES,
    SENTENCE_FEATURES,

This file has been truncated. show original

Topic		Replies	Views
Support for Language Models inside Rasa Release Announcements community , rasa	25	12075	November 25, 2021
How to import huggingface models to Rasa? Rasa Open Source	12	4119	December 27, 2021
I need a Albert in LanguageModelFeature Rasa Open Source	16	1462	January 3, 2022
Featurizer for DIET Rasa Open Source	7	1352	May 15, 2020
Hugging Face custom Tokenizer Rasa Open Source	2	210	March 26, 2024

Custom Transformer-based Featurizer producing inconsistent outputs vs HuggingFace's version

Related Topics