Sentiment analysis issue ! Need help please!

m2cci-sba · July 17, 2019, 12:30pm

i have alwayes this result however i changed the message

always detecting a postive sentiment with confidence 0.333

{‘intent’: {‘name’: ‘insultes’, ‘confidence’: 0.35888357915887636}, ‘entities’: [{‘value’: ‘pos’, ‘confidence’: 0.3333333333333333, ‘entity’: ‘sentiment’, ‘extractor’: ‘sentiment_extractor’}], ‘intent_ranking’: [{‘name’: ‘insultes’, ‘confidence’: 0.35888357915887636}, {‘name’: ‘salutations’, ‘confidence’: 0.07922486366609584}, {‘name’: ‘au_revoir’, ‘confidence’: 0.06963384023417138}, {‘name’: ‘out’, ‘confidence’: 0.057168641232126574}, {‘name’: ‘positive_answer’, ‘confidence’: 0.05699548462320999}, {‘name’: ‘congrats’, ‘confidence’: 0.05146011802312083}, {‘name’: ‘salaire’, ‘confidence’: 0.05105493986389435}, {‘name’: ‘annulation’, ‘confidence’: 0.04765725667849151}, {‘name’: ‘negative_answer’, ‘confidence’: 0.04180718673910084}, {‘name’: ‘merci’, ‘confidence’: 0.03432302984684233}], ‘text’: ‘connard’}

my data contains nlu ans stories .

my config file # Configuration for Rasa NLU.

Components

language: fr_core_news_md pipeline:

name: “SpacyNLP”
name: “SpacyTokenizer”
name: “sentiment.SentimentAnalyzer”
name: “SpacyFeaturizer”
name: “WhitespaceTokenizer”
name: “RegexFeaturizer”
name: “CRFEntityExtractor”
name: “EntitySynonymMapper”
name: “CountVectorsFeaturizer”
name: “SklearnIntentClassifier”

Configuration for Rasa Core.

Policies

policies:

name: MemoizationPolicy
name: KerasPolicy
name: MappingPolicy
name: FormPolicy
name: “FallbackPolicy” nlu_threshold: 0.25 core_threshold: 0.3 fallback_action_name: “action_my_fallback”

labels:

pos pos neu neu neg neg

I’ve created also a registry file

from rasa_nlu.customcomponents.sentiment import MySentimentAnalyzer

from data import sentiment

# all import needed

component_classes = [

# all rasa components added no change

MySentimentAnalyzer

]

sentiment.py

from rasa.nlu.components import Component from rasa.nlu import utils from rasa.nlu.model import Metadata

import nltk from nltk.classify import NaiveBayesClassifier import os

import typing from typing import Any, Optional, Text, Dict

SENTIMENT_MODEL_FILE_NAME = “sentiment_classifier.pkl”

class SentimentAnalyzer(Component): “”“A custom sentiment analysis”"" name = “sentiment” provides = [“entities”] requires = [“tokens”] defaults = {} language_list = [“fr_core_news_md”] print(‘initialised the class’)

def __init__(self, component_config=None):
    super(SentimentAnalyzer, self).__init__(component_config)
def train(self, training_data, cfg, **kwargs):

    """Load the sentiment polarity labels from the text
       file, retrieve training tokens and after formatting
       data train the classifier."""
    with open('labels.txt','r') as f:
        labels = f.read().splitlines()
    training_data = training_data.training_examples #list of Message objects
    tokens = [list(map(lambda x: x.text, t.get('tokens'))) for t in training_data]
    processed_tokens = [self.preprocessing(t) for t in tokens]
    labeled_data = [(t, x) for t,x in zip(processed_tokens, labels)]
    self.clf = NaiveBayesClassifier.train(labeled_data)
def convert_to_rasa(self, value, confidence):
    """Convert model output into the Rasa NLU compatible output format."""

    entity = {"value": value,
              "confidence": confidence,
              "entity": "sentiment",
              "extractor": "sentiment_extractor"}

    return entity
def preprocessing(self, tokens):
    """Create bag-of-words representation of the training examples."""
    
    return ({word: True for word in tokens})
def process(self, message, **kwargs):
    """Retrieve the tokens of the new message, pass it to the classifier
        and append prediction results to the message class."""
    
    if not self.clf:
        # component is either not trained or didn't
        # receive enough training data
        entity = None
    else:
        tokens = [t.text for t in message.get("tokens")]
        tb = self.preprocessing(tokens)
        pred = self.clf.prob_classify(tb)

        sentiment = pred.max()
        confidence = pred.prob(sentiment)

        entity = self.convert_to_rasa(sentiment, confidence)

        message.set("entities", [entity], add_to_output=True)
def persist(self, file_name, model_dir):
    """Persist this model into the passed directory."""
    classifier_file = os.path.join(model_dir, SENTIMENT_MODEL_FILE_NAME)
    utils.json_pickle(classifier_file, self)
    return {"classifier_file": SENTIMENT_MODEL_FILE_NAME}

@classmethod
def load(cls,
         meta: Dict[Text, Any],
         model_dir=None,
         model_metadata=None,
         cached_component=None,
         **kwargs):
    file_name = meta.get("classifier_file")
    classifier_file = os.path.join(model_dir, file_name)
    return utils.json_unpickle(classifier_file)

gcgloven · July 18, 2019, 3:14am

Do you mind to share with me how you are labelling your label.txt file?

I think the labelled file could be causing this

m2cci-sba · July 18, 2019, 7:05am

pos pos neu neu neg neg neg

that all i’ve done in the labels file just for test but i have a lot of intentions!

erohmensing · July 18, 2019, 9:27am

I would recommend debugging the process method with some print statements to see where things are going from different input to the same confidence

m2cci-sba · July 18, 2019, 9:48am

Do you talk about this fonction def process(self, message, **kwargs): “”“Retrieve the tokens of the new message, pass it to the classifier and append prediction results to the message class.”""

    if not self.clf:
        # component is either not trained or didn't
        # receive enough training data
        entity = None
    else:
        tokens = [t.text for t in message.get("tokens")]
        tb = self.preprocessing(tokens)
        pred = self.clf.prob_classify(tb)

        sentiment = pred.max()
        confidence = pred.prob(sentiment)

        entity = self.convert_to_rasa(sentiment, confidence)

        message.set("entities", [entity], add_to_output=True)

erohmensing · July 18, 2019, 10:19am

yes, that is where the sentiment and confidence are coming from, so in that else loop you should try different tokens to see if you get the same output. If you don’t, then i would print the “message” at the beginning of the method to see if it is the correct message that you are sending.

m2cci-sba · July 18, 2019, 2:03pm

Yes i’ve printed and it’s the right tokens sent to the process

erohmensing · July 18, 2019, 2:24pm

Okay, that’s a good start – can you print the sentiment and confidence that is predicted here:

sentiment = pred.max()
confidence = pred.prob(sentiment)

m2cci-sba · July 18, 2019, 2:39pm

as you see here the same value and the same entitie

erohmensing · July 18, 2019, 3:56pm

Ok so clearly something is going wrong there – there has to be a bug in your

tb = self.preprocessing(tokens)
pred = self.clf.prob_classify(tb)

sentiment = pred.max()
confidence = pred.prob(sentiment)

So just keep digging and figure out where the different inputs change to the same output. My guess is it is probably in

pred = self.clf.prob_classify(tb)

m2cci-sba · July 19, 2019, 8:31am

ok thank you , can you explain me this , i didn’t understand why we use training_data file : training_data = training_data.training_examples #list of Message objects tokens = [list(map(lambda x: x.text, t.get(‘tokens’))) for t in training_data] processed_tokens = [self.preprocessing(t) for t in tokens] labeled_data = [(t, x) for t,x in zip(processed_tokens, labels)] self.clf = NaiveBayesClassifier.train(labeled_data)

erohmensing · July 19, 2019, 8:51am

You have to use the training data file to train the classifier, otherwise it doesn’t know which inputs your labels in your labels.txt correspond to

So maybe it’s not the processing step that is going wrong, you’ve just incorrectly trained the classifier?

labels:
pos pos neu neu neg neg

Are these the only labels in your labels.txt? There should be a label to correspond with each input in your training data

m2cci-sba · July 19, 2019, 9:09am

I have done just a few labels for testing should i do the same number of labels as my nlu data( contains the intentions)?

erohmensing · July 19, 2019, 9:16am

Yes, that’s the point of the labels – they have to correlate with your data. you will have to assign one more label to the NLU training examples - the polarity of the sentiment (positive, negative, neutral). This means that for each example, it has to have (the correct) sentiment defined in order for your model to learn what word combinations lead to bad or good sentiment

m2cci-sba · July 19, 2019, 9:54am

Thank your for helping me! i have changed my nlu, it changed my value to neutral (happy that’s changed frm pos to neu :D) but it’s like it’s blocked after the fist message :

m2cci-sba · July 19, 2019, 9:55am

My nlu file contains:

intent: insultes

Connard
quel idiot
t’es un batard
Putain
Enculé
salaud
Tu es une salope
bête
imbécile
Quel con
T’es trop nul
débile
n’importe quoi
Fait chier !
t’es un abruti
Je t’emmerde !
Ta gueule !

intent: salutations

bonjour Liloo
bonjour
salut
bonsoir
bojour !
bonjor
bonjur
salut
bnoour
bjonur
slt
bonjour chatbot
salt
bjour
bsr
bnosoir
hola
hello !
coucou
yo
oyé
bonjour bonsoir
je te souhaite bien le bonjour

My labels file contains :

neg neg neg neg neg neg neg neg neg neg neg neg neg neg neg neg neg

pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos pos

m2cci-sba · July 22, 2019, 9:35am

I think the problem is here pred = self.clf.prob_classify(tb) because wen digging i’ve noticed that in this part the outpout is similar , but i don’t know what can i do ? can you help me please!

erohmensing · July 22, 2019, 10:31am

I don’t know about that classifier, as it’s not rasa code, it comes from nltk. If for some reason it’s acting up, you’d have to take it up with them. But first you should also look at how it is trained by debugging the train function.

    labeled_data = [(t, x) for t,x in zip(processed_tokens, labels)]
    self.clf = NaiveBayesClassifier.train(labeled_data)

So you need to look at labeled_data and make sure it looks right.

Chaitanya · July 26, 2019, 7:25am

I am also getting same result , Please any one help me

m2cci-sba · July 26, 2019, 7:37am

I didn’t find a solution , i’m searching for other solution

Topic		Replies	Views
Need help please! Rasa Open Source	10	870	July 17, 2019
Getting Confidence of 0.0 Rasa Open Source	9	577	December 9, 2021
Retrieval Intents has confidence > 1 Rasa Open Source	4	497	April 6, 2021
How can we improve confidence score of intents Rasa Open Source	7	4663	October 15, 2018
Rasa with spaCy Rasa Open Source	3	526	March 3, 2022