Custom sentiment analysis components issue with Core

Hello there,

I’ve developed a custom component for sentiment analysis for RASA 2.X but I followed an old tutorial and adapt it to the new requirements of Rasa. Here is the code

import typing
from typing import Any, Optional, Text, Dict, List, Type

from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.shared.nlu.training_data.message import Message


from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine") #be aware, there is his auto tokenizer -> might cause bugs, to try...
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)


if typing.TYPE_CHECKING:
    from rasa.nlu.model import Metadata


class SentimentAnalyzer(Component):
    """A pretrained sentiment analyzer component"""

    @classmethod
    def required_components(cls) -> List[Type[Component]]:
        """Specify which components need to be present in the pipeline."""

        return []

    defaults = {}

    supported_language_list = "fr"

    not_supported_language_list = None

    def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
        super().__init__(component_config)

    def train(
        self,
        training_data: TrainingData,
        config: Optional[RasaNLUModelConfig] = None,
        **kwargs: Any,
    ) -> None:
       
        pass

    def convert_to_rasa(self, value, confidence):
        """Convert model output into the Rasa NLU compatible output format."""
        
        entity = {"entity": "sentiment",
                   "confidence_entity": confidence,
                  "value": value,
                  "extractor": "sentiment.SentimentAnalyzer"}

        return entity

    def process(self, message: Message, **kwargs: Any) -> None:
      
        sentiment, confidence = nlp(message.get('text'))[0]['label'], nlp(message.get('text'))[0]['score']
        entity = self.convert_to_rasa(sentiment, confidence)
        message.set("entities", [entity], add_to_output=True)


    def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
        """Persist this component to disk for future loading."""
        pass

    @classmethod
    def load(
        cls,
        meta: Dict[Text, Any],
        model_dir: Text,
        model_metadata: Optional["Metadata"] = None,
        cached_component: Optional["Component"] = None,
        **kwargs: Any,
    ) -> "Component":
        """Load this component from file."""

        if cached_component:
            return cached_component
        else:
            return cls(meta)

Throughout the development process, I have had an issue at NLU level because my components requires text but recieved tokens.

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

I solved that issue with the .get('text') attribute. So the NLU output is correct as shown in the following picture.

Now when trying to train the whole model, it fails at the core level specifically with the Ted policy. I get the same error code and I don’t know how to solve it. Here is my pipeline

language: fr

pipeline:

  - name: "sentiment.SentimentAnalyzer"
  - name: WhitespaceTokenizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 10
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
    batch_strategy: sequence
    entity_recognition: true
  - name: EntitySynonymMapper
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.02
  

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
    max_history: 10
  - name: TEDPolicy
    epochs: 100
  - name: RulePolicy
  - name: UnexpecTEDIntentPolicy
    epochs: 2

The thing is that the Ted policy bugs when recieving the entity sentiment. I tried to use the same formatting for classic rasa entities and added sentiment as an entity in the domain file but no solutions. Any ideas on how to solve this?

Not sure if you solved this yet or not, I would try moving your component down the pipeline. Try right after your - name: WhitespaceTokenizer and before your - name: LexicalSyntacticFeaturizer.