Sentiment analysis issue ! Need help please!

i have alwayes this result however i changed the message

always detecting a postive sentiment with confidence 0.333

{‘intent’: {‘name’: ‘insultes’, ‘confidence’: 0.35888357915887636}, ‘entities’: [{‘value’: ‘pos’, ‘confidence’: 0.3333333333333333, ‘entity’: ‘sentiment’, ‘extractor’: ‘sentiment_extractor’}], ‘intent_ranking’: [{‘name’: ‘insultes’, ‘confidence’: 0.35888357915887636}, {‘name’: ‘salutations’, ‘confidence’: 0.07922486366609584}, {‘name’: ‘au_revoir’, ‘confidence’: 0.06963384023417138}, {‘name’: ‘out’, ‘confidence’: 0.057168641232126574}, {‘name’: ‘positive_answer’, ‘confidence’: 0.05699548462320999}, {‘name’: ‘congrats’, ‘confidence’: 0.05146011802312083}, {‘name’: ‘salaire’, ‘confidence’: 0.05105493986389435}, {‘name’: ‘annulation’, ‘confidence’: 0.04765725667849151}, {‘name’: ‘negative_answer’, ‘confidence’: 0.04180718673910084}, {‘name’: ‘merci’, ‘confidence’: 0.03432302984684233}], ‘text’: ‘connard’}

my data contains nlu ans stories .

my config file :slight_smile:# Configuration for Rasa NLU.

Components

language: fr_core_news_md pipeline:

  • name: “SpacyNLP”
  • name: “SpacyTokenizer”
  • name: “sentiment.SentimentAnalyzer”
  • name: “SpacyFeaturizer”
  • name: “WhitespaceTokenizer”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “CountVectorsFeaturizer”
  • name: “SklearnIntentClassifier”

Configuration for Rasa Core.

Policies

policies:

  • name: MemoizationPolicy
  • name: KerasPolicy
  • name: MappingPolicy
  • name: FormPolicy
  • name: “FallbackPolicy” nlu_threshold: 0.25 core_threshold: 0.3 fallback_action_name: “action_my_fallback”

labels:

pos pos neu neu neg neg

I’ve created also a registry file :slight_smile:

from rasa_nlu.customcomponents.sentiment import MySentimentAnalyzer

from data import sentiment

# all import needed

component_classes = [

# all rasa components added no change

MySentimentAnalyzer

]

sentiment.py

from rasa.nlu.components import Component from rasa.nlu import utils from rasa.nlu.model import Metadata

import nltk from nltk.classify import NaiveBayesClassifier import os

import typing from typing import Any, Optional, Text, Dict

SENTIMENT_MODEL_FILE_NAME = “sentiment_classifier.pkl”

class SentimentAnalyzer(Component): “”“A custom sentiment analysis”"" name = “sentiment” provides = [“entities”] requires = [“tokens”] defaults = {} language_list = [“fr_core_news_md”] print(‘initialised the class’)

def __init__(self, component_config=None):
    super(SentimentAnalyzer, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):

    """Load the sentiment polarity labels from the text
       file, retrieve training tokens and after formatting
       data train the classifier."""
    with open('labels.txt','r') as f:
        labels = f.read().splitlines()
    training_data = training_data.training_examples #list of Message objects
    tokens = [list(map(lambda x: x.text, t.get('tokens'))) for t in training_data]
    processed_tokens = [self.preprocessing(t) for t in tokens]
    labeled_data = [(t, x) for t,x in zip(processed_tokens, labels)]
    self.clf = NaiveBayesClassifier.train(labeled_data)
def convert_to_rasa(self, value, confidence):
    """Convert model output into the Rasa NLU compatible output format."""

    entity = {"value": value,
              "confidence": confidence,
              "entity": "sentiment",
              "extractor": "sentiment_extractor"}

    return entity
def preprocessing(self, tokens):
    """Create bag-of-words representation of the training examples."""
    
    return ({word: True for word in tokens})
def process(self, message, **kwargs):
    """Retrieve the tokens of the new message, pass it to the classifier
        and append prediction results to the message class."""
    
    if not self.clf:
        # component is either not trained or didn't
        # receive enough training data
        entity = None
    else:
        tokens = [t.text for t in message.get("tokens")]
        tb = self.preprocessing(tokens)
        pred = self.clf.prob_classify(tb)

        sentiment = pred.max()
        confidence = pred.prob(sentiment)

        entity = self.convert_to_rasa(sentiment, confidence)

        message.set("entities", [entity], add_to_output=True)
def persist(self, file_name, model_dir):
    """Persist this model into the passed directory."""
    classifier_file = os.path.join(model_dir, SENTIMENT_MODEL_FILE_NAME)
    utils.json_pickle(classifier_file, self)
    return {"classifier_file": SENTIMENT_MODEL_FILE_NAME}

@classmethod
def load(cls,
         meta: Dict[Text, Any],
         model_dir=None,
         model_metadata=None,
         cached_component=None,
         **kwargs):
    file_name = meta.get("classifier_file")
    classifier_file = os.path.join(model_dir, file_name)
    return utils.json_unpickle(classifier_file)

Do you mind to share with me how you are labelling your label.txt file?

I think the labelled file could be causing this

pos pos neu neu neg neg neg

that all i’ve done in the labels file just for test but i have a lot of intentions!

I would recommend debugging the process method with some print statements to see where things are going from different input to the same confidence

Do you talk about this fonction def process(self, message, **kwargs): “”“Retrieve the tokens of the new message, pass it to the classifier and append prediction results to the message class.”""

    if not self.clf:
        # component is either not trained or didn't
        # receive enough training data
        entity = None
    else:
        tokens = [t.text for t in message.get("tokens")]
        tb = self.preprocessing(tokens)
        pred = self.clf.prob_classify(tb)

        sentiment = pred.max()
        confidence = pred.prob(sentiment)

        entity = self.convert_to_rasa(sentiment, confidence)

        message.set("entities", [entity], add_to_output=True)

yes, that is where the sentiment and confidence are coming from, so in that else loop you should try different tokens to see if you get the same output. If you don’t, then i would print the “message” at the beginning of the method to see if it is the correct message that you are sending.

Yes i’ve printed and it’s the right tokens sent to the process

Okay, that’s a good start – can you print the sentiment and confidence that is predicted here:

sentiment = pred.max()
confidence = pred.prob(sentiment)

as you see here the same value and the same entitie image

Ok so clearly something is going wrong there – there has to be a bug in your

tb = self.preprocessing(tokens)
pred = self.clf.prob_classify(tb)

sentiment = pred.max()
confidence = pred.prob(sentiment)

So just keep digging and figure out where the different inputs change to the same output. My guess is it is probably in

pred = self.clf.prob_classify(tb)

ok thank you , can you explain me this , i didn’t understand why we use training_data file : training_data = training_data.training_examples #list of Message objects tokens = [list(map(lambda x: x.text, t.get(‘tokens’))) for t in training_data] processed_tokens = [self.preprocessing(t) for t in tokens] labeled_data = [(t, x) for t,x in zip(processed_tokens, labels)] self.clf = NaiveBayesClassifier.train(labeled_data)

You have to use the training data file to train the classifier, otherwise it doesn’t know which inputs your labels in your labels.txt correspond to

So maybe it’s not the processing step that is going wrong, you’ve just incorrectly trained the classifier?

labels:
pos pos neu neu neg neg

Are these the only labels in your labels.txt? There should be a label to correspond with each input in your training data

I have done just a few labels for testing should i do the same number of labels as my nlu data( contains the intentions)?

Yes, that’s the point of the labels – they have to correlate with your data. you will have to assign one more label to the NLU training examples - the polarity of the sentiment (positive, negative, neutral). This means that for each example, it has to have (the correct) sentiment defined in order for your model to learn what word combinations lead to bad or good sentiment

Thank your for helping me! i have changed my nlu, it changed my value to neutral (happy that’s changed frm pos to neu :D) but it’s like it’s blocked after the fist message :

My nlu file contains:

intent: insultes

  • Connard

  • quel idiot

  • t’es un batard

  • Putain

  • Enculé

  • salaud

  • Tu es une salope

  • bête

  • imbécile

  • Quel con

  • T’es trop nul

  • débile

  • n’importe quoi

  • Fait chier !

  • t’es un abruti

  • Je t’emmerde !

  • Ta gueule !

intent: salutations

  • bonjour Liloo

  • bonjour

  • salut

  • bonsoir

  • bojour !

  • bonjor

  • bonjur

  • salut

  • bnoour

  • bjonur

  • slt

  • bonjour chatbot

  • salt

  • bjour

  • bsr

  • bnosoir

  • hola

  • hello !

  • coucou

  • yo

  • oyé

  • bonjour bonsoir

  • je te souhaite bien le bonjour

My labels file contains :

neg neg neg neg neg neg neg neg neg neg neg neg neg neg neg neg neg

pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos

1 Like

I think the problem is here pred = self.clf.prob_classify(tb) because wen digging i’ve noticed that in this part the outpout is similar , but i don’t know what can i do ? can you help me please!

I don’t know about that classifier, as it’s not rasa code, it comes from nltk. If for some reason it’s acting up, you’d have to take it up with them. But first you should also look at how it is trained by debugging the train function.

    labeled_data = [(t, x) for t,x in zip(processed_tokens, labels)]
    self.clf = NaiveBayesClassifier.train(labeled_data)

So you need to look at labeled_data and make sure it looks right.

I am also getting same result , Please any one help me

I didn’t find a solution , i’m searching for other solution :frowning: