Enhacing Rasa NLU models with Custom Components

I have a question about the second example which is about the pretrained Sentiment Analyzer. We define sid = SentimentIntensityAnalyzer() in process. Will this instantiate an instance each time process called ?

2 Likes

Hey @Juste

I was wondering if Custom Components could solve a problem that I’m having. I need an Entity that is specific to each user. For example, UserA may have an Entity “FamilyMembers” with [“John”, “Mary”, “Evgeni”] while UserB may have an Entity “FamilyMembers” with [“Leandro”]. Names such as Leandro would not be normally detected as it’s not a common name.

When a user of ours is connected to us. we would like to have the Entity limited to his data only. This way, we can guarantee that the name matches and is hopefully more accurate. DialogFlow calls this SessionEntity.

I was thinking a Custom Component could create a simple regex filter with the names of the family members. However, I’m not sure if there is a way to pass additional data to component so that it can match to the user’s family members. Is this possible or is there something else that could, at least, help us solve “unknown names”?

Thank you for a great blog btw.

@Juste im facing a problem while creating a sentiment which is using nltk only with custom component as it needs a defined of metadata of the model you can find the complete issue in this question

thanks in advance

Hi, can anyone help me with my sentiment, so i followed the blog post, and here is my code:

from rasa.nlu.components import Component
from rasa.nlu import utils
from rasa.nlu.model import Metadata

import nltk
from nltk.classify import NaiveBayesClassifier
from nltk.tokenize import word_tokenize # or use some other tokenizer
import json
import os

import typing
from typing import Any, Optional, Text, Dict

SENTIMENT_MODEL_FILE_NAME = "sentiment_classifier.pkl"



class SentimentAnalyzer(Component):
    """A custom sentiment analysis component"""
    name = "sentiment"
    provides = ["entities"]
    requires = ["tokens"]
    defaults = {}
    language_list = ["en"]
    print('initialised the class')

    def __init__(self, component_config=None):
        super(SentimentAnalyzer, self).__init__(component_config)

    def train(self, training_data, cfg, **kwargs):
        """Load the sentiment polarity labels from the text
           file, retrieve training tokens and after formatting
           data train the classifier."""
        self.training = []
        
        with open('./default_dataset_training.json', 'r') as raw_training_data:
            training_data = json.load(raw_training_data)
            print(training_data)
            neg = training_data['neg']
            for val in neg:
                self.training.append((val[0]['value'], 'neg'))
            
            pos = training_data['pos']
            for val_pos in pos:
                self.training.append((val_pos[0]['value'], 'pos'))

            processed_training = []
            for t in self.training:
                processed_training.append((self.preprocessing(word_tokenize(t[0])), t[1]))
                    
            self.clf = NaiveBayesClassifier.train(processed_training)



    def convert_to_rasa(self, value, confidence):
        """Convert model output into the Rasa NLU compatible output format."""

        entity = {"value": value,
                  "confidence": confidence,
                  "entity": "sentiment",
                  "extractor": "sentiment_extractor"}

        return entity
        

    def preprocessing(self, tokens):
        """Create bag-of-words representation of the training examples."""
        
        return ({word: True for word in tokens})


    def process(self, message, **kwargs):
        """Retrieve the tokens of the new message, pass it to the classifier
            and append prediction results to the message class."""
        
        if not self.clf:
            # component is either not trained or didn't
            # receive enough training data
            entity = None
        else:
            tokens = [t.text for t in message.get("tokens")]
            processed = self.preprocessing(tokens)
            pred = self.clf.prob_classify(processed)
            sentiment = pred.max()
            confidence = pred.prob(sentiment)

            entity = self.convert_to_rasa(sentiment, confidence)

            message.set("entities", [entity], add_to_output=True)


    def persist(self, file_name, model_dir):
        """Persist this model into the passed directory."""
        classifier_file = os.path.join(model_dir, SENTIMENT_MODEL_FILE_NAME)
        utils.json_pickle(classifier_file, self)
        return {"classifier_file": SENTIMENT_MODEL_FILE_NAME}

    @classmethod
    def load(cls,
             meta: Dict[Text, Any],
             model_dir=None,
             model_metadata=None,
             cached_component=None,
             **kwargs):
        file_name = meta.get("classifier_file")
        classifier_file = os.path.join(model_dir, file_name)
        return utils.json_unpickle(classifier_file)

Here is my config:

language: en
pipeline:
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "sentiment.SentimentAnalyzer"
- name: "ner_crf"
- name: "ner_spacy"
- name: "ner_synonyms"
- name: CountVectorsFeaturizer
- intent_split_symbol: +
  intent_tokenization_flag: true
  name: EmbeddingIntentClassifier

When i tried to test my NLU model, i always get the same result:

{
      "value": "neg",
      "confidence": 0.696105702364395,
      "entity": "sentiment",
      "extractor": "sentiment_extractor"
    }

But when i tried to test the code with same training data:


from rasa.nlu.components import Component
from rasa.nlu import utils
from rasa.nlu.model import Metadata

import nltk
from nltk.classify import NaiveBayesClassifier
from nltk.tokenize import word_tokenize # or use some other tokenizer
import json
import os

import typing
from typing import Any, Optional, Text, Dict
from nltk.tokenize import word_tokenize
training = []

def preprocessing(tokens):
    """Create bag-of-words representation of the training examples."""
    
    return ({word: True for word in tokens})
        
with open('./default_dataset_training.json', 'r') as raw_training_data:
    training_data = json.load(raw_training_data)
    print(training_data)
    neg = training_data['neg']
    for val in neg:
        training.append((val[0]['value'], 'neg'))
    
    pos = training_data['pos']
    for val_pos in pos:
        training.append((val_pos[0]['value'], 'pos'))

    processed_training = []
    for t in training:
        processed_training.append((preprocessing(word_tokenize(t[0])), t[1]))
            
    clf = NaiveBayesClassifier.train(processed_training)

    while True:
        text = input(">")
        tokenize = word_tokenize(text)
        processed = preprocessing(tokenize)
        pred = clf.prob_classify(processed)
        sentiment = pred.max()
        confidence = pred.prob(sentiment)

        print(sentiment)
        print(confidence)

It is working fine. Can someone help me with this? Thanks

@Juste Hi I am facing an issue in implementing custom components in rasa_nlu. In config I have put name: “sentiment.SentimentAnalyzer”

The error that I am getting during training

Can you please help me on this? I am using rasa_nlu 0.15.0

Hi @rideep. You shouldn’t put the custom components code inside the rasa_nlu package. The file should sit in your project (assistant’s) directory. Do you get the same error if you structure your project files that way?

Anyone, please answer this question. How to load a model one single time and use it everytime?

Hi @Juste. I was implementing the sentiment analysis component but when training keep getting the error: AttributeError: module ‘rasa.nlu.utils’ has no attribute ‘json_pickle’. Running on rasa version 1.10.0 Tried to use python pickle module as well but no luck yet. Any specific thing that I am missing? For starters I just used the code in your blog post.

AttributeError: module ‘rasa.nlu.utils’ has no attribute ‘json_pickle’

Hi! @samgpt I read this right now, and I had the same issue trying to following the tutorial Enhancing Rasa NLU models with Custom Components… In rasa version 1.10.x try using this:

import rasa.utils.io as io_utils

and:

io_utils.json_pickle()

io_utils.json_unpickle()

This work for me. I’m using rasa 1.10.2

I write this here in case someone else needs. Regards!

Oops! I realized that there was a problem when the model is saved or loaded, and the SentimentAnalyser always gived to me the same answer! Finally I used pickle like suggest Collen here Sentiment analysis issue ! Need help please!

Hey can a add 2 custom components my config file will look like this

  • name: WhitespaceTokenizer

  • name: component 1

  • name : component 2

  • name: RegexFeaturizer … `

whenever I do this only component 2 is used. what is this happening.

Also, the form stops working when I use the custom sentiment component.

1 Like

With the custom components, make sure if it not is overwriting the object you want to spit out(tokens, featurizers or entities) it is possible that the second components initializes a new object and overrides the list object you want to append to.

happened to me when i introduced a custom entity extractor

an example would be

message.set(
            ENTITIES, message.get(ENTITIES, []) + extracted_entities, add_to_output=True
        )
1 Like

Hi @Juste I am using rasa 2.0 i create a python package inside the rasa nlu folder with the name of customcomponent inside of it I create a file sentiment.py . in the registr.py i do an import : from rasa.nlu.customcomponent import SentimentAnalyzer , I add the name of the class SentimentAnalyzer, in the component_classes list and I add this component in the config.yml : - name: sentiment.SentimentAnalyzer but It gives me this error : ModuleNotFoundError: No module named ‘sentiment’ can you help me please ?

I’m trying to make custom component for my bot, I’m getting an error tokens = [list(map(lambda x: x.text, t.get('tokens'))) for t in training_data] TypeError: 'NoneType' object is not iterable

can anyone help me out, what I understood is each object in training data does not have anything like tokens

@hemanthyernagula it sounds like training_data is none… Can you check this?

I am getting a similar error but when I check training_data it is not None.

@abhi are you following the tutorial? If not, could you make a new post for this question, so that we can keep this one from getting (even more) cluttered? You can feel free to tag me on the new post :slight_smile: Otherwise, could you share what you have modified from the tutorial?

hello @Juste So this what i did

  1. create the sentiments.py

  2. update my pipeline in config.yml

  3. create the labels.txt

I have trained my bot but when i run it (rasa shell --debug) i get nothing and i can’t print the sentiment and the confidence? I’m very confused

Hello @Juste

  1. Can you please confirm that the sentiment.py would go in the main project directory and not inside the actions folder?

  2. Can you also tell me how can I check the rasa nlu output? I usually use the tracker.events. But if I have to check if my custom component has been called or giving me the right results, how can I check where the convert_to_rasa() is being added?

  3. Could you also give more insight on setting the PYTHONPATH for my custom component to be picked up by rasa? by that I mean more step by step instruction on setting the PYTHONPATH… Does that mean i just do ‘which python’ and add that to my PATHONPATH and then add this to ~/.bash_profile ’ export PYTHONPATH=/path_to_your_project_dir/:$PYTHONPATH’ ?

  4. The last question would be using huggingface models with ‘import pipeline’ inside custom components. Do you see any issues with that approach?

Looking forward to hearing from you. Thanks

Here is the latest blog regarding implementing custom components in rasa 3.0

1 Like