Hello there,
I’ve developed a custom component for sentiment analysis for RASA 2.X but I followed an old tutorial and adapt it to the new requirements of Rasa. Here is the code
import typing
from typing import Any, Optional, Text, Dict, List, Type
from rasa.nlu.components import Component
from rasa.nlu.config import RasaNLUModelConfig
from rasa.shared.nlu.training_data.training_data import TrainingData
from rasa.shared.nlu.training_data.message import Message
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine") #be aware, there is his auto tokenizer -> might cause bugs, to try...
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
if typing.TYPE_CHECKING:
from rasa.nlu.model import Metadata
class SentimentAnalyzer(Component):
"""A pretrained sentiment analyzer component"""
@classmethod
def required_components(cls) -> List[Type[Component]]:
"""Specify which components need to be present in the pipeline."""
return []
defaults = {}
supported_language_list = "fr"
not_supported_language_list = None
def __init__(self, component_config: Optional[Dict[Text, Any]] = None) -> None:
super().__init__(component_config)
def train(
self,
training_data: TrainingData,
config: Optional[RasaNLUModelConfig] = None,
**kwargs: Any,
) -> None:
pass
def convert_to_rasa(self, value, confidence):
"""Convert model output into the Rasa NLU compatible output format."""
entity = {"entity": "sentiment",
"confidence_entity": confidence,
"value": value,
"extractor": "sentiment.SentimentAnalyzer"}
return entity
def process(self, message: Message, **kwargs: Any) -> None:
sentiment, confidence = nlp(message.get('text'))[0]['label'], nlp(message.get('text'))[0]['score']
entity = self.convert_to_rasa(sentiment, confidence)
message.set("entities", [entity], add_to_output=True)
def persist(self, file_name: Text, model_dir: Text) -> Optional[Dict[Text, Any]]:
"""Persist this component to disk for future loading."""
pass
@classmethod
def load(
cls,
meta: Dict[Text, Any],
model_dir: Text,
model_metadata: Optional["Metadata"] = None,
cached_component: Optional["Component"] = None,
**kwargs: Any,
) -> "Component":
"""Load this component from file."""
if cached_component:
return cached_component
else:
return cls(meta)
Throughout the development process, I have had an issue at NLU level because my components requires text but recieved tokens.
ValueError: text input must of type `str` (single example), `List[str]` (batch or single pretokenized example) or `List[List[str]]` (batch of pretokenized examples).
I solved that issue with the .get('text')
attribute. So the NLU output is correct as shown in the following picture.
Now when trying to train the whole model, it fails at the core level specifically with the Ted policy. I get the same error code and I don’t know how to solve it. Here is my pipeline
language: fr
pipeline:
- name: "sentiment.SentimentAnalyzer"
- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 10
- name: DIETClassifier
epochs: 100
constrain_similarities: true
batch_strategy: sequence
entity_recognition: true
- name: EntitySynonymMapper
- name: FallbackClassifier
threshold: 0.4
ambiguity_threshold: 0.02
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
max_history: 10
- name: TEDPolicy
epochs: 100
- name: RulePolicy
- name: UnexpecTEDIntentPolicy
epochs: 2
The thing is that the Ted policy bugs when recieving the entity sentiment. I tried to use the same formatting for classic rasa entities and added sentiment as an entity in the domain file but no solutions. Any ideas on how to solve this?