Custom sentiment analysis components issue with Core

ygdo · November 5, 2021, 1:56pm

Hello there,

I’ve developed a custom component for sentiment analysis for RASA 2.X but I followed an old tutorial and adapt it to the new requirements of Rasa. Here is the code

import typing
from typing import Any, Optional, Text, Dict, List, Type

from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.shared.nlu.training_data.message import Message


from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline

tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine") #be aware, there is his auto tokenizer -> might cause bugs, to try...
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)


if typing.TYPE_CHECKING:
    from rasa.nlu.model import Metadata


class SentimentAnalyzer(Component):
    """A pretrained sentiment analyzer component"""

    @classmethod
    def required_components(cls) -> List[Type[Component]]:
        """Specify which components need to be present in the pipeline."""

        return []

    defaults = {}

    supported_language_list = "fr"

    not_supported_language_list = None

    def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
        super().__init__(component_config)

    def train(
        self,
        training_data: TrainingData,
        config: Optional[RasaNLUModelConfig] = None,
        **kwargs: Any,
    ) -> None:
       
        pass

    def convert_to_rasa(self, value, confidence):
        """Convert model output into the Rasa NLU compatible output format."""
        
        entity = {"entity": "sentiment",
                   "confidence_entity": confidence,
                  "value": value,
                  "extractor": "sentiment.SentimentAnalyzer"}

        return entity

    def process(self, message: Message, **kwargs: Any) -> None:
      
        sentiment, confidence = nlp(message.get('text'))[0]['label'], nlp(message.get('text'))[0]['score']
        entity = self.convert_to_rasa(sentiment, confidence)
        message.set("entities", [entity], add_to_output=True)


    def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
        """Persist this component to disk for future loading."""
        pass

    @classmethod
    def load(
        cls,
        meta: Dict[Text, Any],
        model_dir: Text,
        model_metadata: Optional["Metadata"] = None,
        cached_component: Optional["Component"] = None,
        **kwargs: Any,
    ) -> "Component":
        """Load this component from file."""

        if cached_component:
            return cached_component
        else:
            return cls(meta)

Throughout the development process, I have had an issue at NLU level because my components requires text but recieved tokens.

ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).

I solved that issue with the .get('text') attribute. So the NLU output is correct as shown in the following picture.

Now when trying to train the whole model, it fails at the core level specifically with the Ted policy. I get the same error code and I don’t know how to solve it. Here is my pipeline

language: fr

pipeline:

  - name: "sentiment.SentimentAnalyzer"
  - name: WhitespaceTokenizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 10
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
    batch_strategy: sequence
    entity_recognition: true
  - name: EntitySynonymMapper
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.02
  

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
    max_history: 10
  - name: TEDPolicy
    epochs: 100
  - name: RulePolicy
  - name: UnexpecTEDIntentPolicy
    epochs: 2

The thing is that the Ted policy bugs when recieving the entity sentiment. I tried to use the same formatting for classic rasa entities and added sentiment as an entity in the domain file but no solutions. Any ideas on how to solve this?

jonathanpwheat · January 27, 2022, 7:46pm

Not sure if you solved this yet or not, I would try moving your component down the pipeline. Try right after your - name: WhitespaceTokenizer and before your - name: LexicalSyntacticFeaturizer.

Topic		Replies	Views
Getting Custom Component to Work Rasa Open Source	8	1559	September 7, 2021
Custom Components Tutorials, Resources & Videos sentiment-analyzer	4	1762	April 18, 2020
Can anyone please help me in building the Sentiment Analysis in Rasa Open Source Rasa Open Source	9	560	November 16, 2021
Training with custom components through HTTP API Rasa Open Source	2	541	April 18, 2019
Custom graph component for sentiment analysis Rasa Open Source	11	1575	September 7, 2023

Custom sentiment analysis components issue with Core

Related topics