Clarification regarding NLU Pipeline and DIETClassifier

Hey there everybody!

I’m trying to get a better understanding for the pipeline in general and also about how the DIETClassifier works. I consider myself to be having a good basic knowledge, but I am needing some help to solidify my understanding of some of the concepts.

I already read through a lot of ressources, like the rasa masterclass ebook, the documentation, watched plenty of Rasa YouTube Videos, for the master class, as well as the algorithm whiteboard and also had a look at the Paper for the DIETClassifier. But for some aspects I’m having a hard time wrapping my head around it and thought about asking within the community.

I will try to state some facts regarding an example pipeline to subsequently ask associated questions to help me clarify the concepts. Thanks!

The pipeline contains multiple components that process data sequentially. Every utterance will be processed by these components during the training, as well as during production. The chatbot learns how to extract entities during the trainingphase, and afterwards uses the learned model to classify intents and to extract entities. So far so good. Let’s consider the described pipeline from the documentation @ Sensible Starting Pipelines:


  • name: SpacyNLP
  • name: SpacyTokenizer
  • name: SpacyFeaturizer
  • name: RegexFeaturizer
  • name: LexicalSyntacticFeaturizer
  • name: CountVectorsFeaturizer
  • name: CountVectorsFeaturizer
    analyzer: “char_wb”
    min_ngram: 1
    max_ngram: 4
  • name: DIETClassifier
    epochs: 100
  • name: EntitySynonymMapper
  • name: ResponseSelector
    epochs: 100

In the first step we load the pre-trained embeddings of a spaCy-supported model. This allows our model to get a certain “feeling” for a language, without having to define a lot of traingsdata ourselves. The Tokenizer splits utterances into different tokens that the Featurizers use to convert them into dense numerical vectors that try to represent the information of a word. These numbers can then be fed into our DIETClassifier to be trained on, to learn how to extract entities and classify intents.

I hope what I stated so far was true. Now to my further questions:

What is happening to my own trainingdata within the pipeline in contrast to the training data that is provided through the spaCy language model? Is the trainingsdata just being thrown together to be then be fed through the pipeline or do the different kinds of trainingdata somehow use different components?

Do multiple Featurizers mean that the data is being shaped further down the road (since it’s a pipeline) or do they create seperate features? In the documentation is also stated, that the SpacyFeaturizer provides pre-trained word embeddings from GloVe or fastText…how do I know which is being used, I couldn’t find further information to this.

Now in the Paper and the associated youtube video it is stated, that the DIETClassifier during the training uses a pretrained embedding like BERT, GloVe or ConveRT. Is in our case spaCy used or is this not to be confused with each other? How is it related to the SpacyFeaturizer?

I think looking through all the ressources had me left being more confused instead of feeling a true understanding. I’m grateful for anyone trying to help me connect the dots.



1 Like

Hi Patrick.

I was working on content to explain the overview of the pipeline better. So let me try to connect some dots.

In Rasa, the NLU pipeline is trying to predict intents and entities.

The pipeline starts with text on one end but it is processed by multiple steps in the pipeline before we have our predictions. One of the important parts is to take tokens (extracted from the text) and to add features to them.

What features we attach depends on the steps in the pipeline but generally we generate two types of features:

Besides features for tokens, we also generate features for the entire utterance. This is sometimes also referred to as the CLS token. The sparse features in this token are a sum of all the sparse features in the tokens. The dense features are either a pooled sum/mean of word vectors (in the case of spaCy) or a contextualised representation of the entire text (in the case of huggingface models).

Note that there’s a community maintained project called rasa-nlu-examples that contain many experimental featurizers for Non-English languages. It’s not part of the main Rasa repository but can be of help to many users as there are over 275 languages supported. That library also supports gensim and GloVe embeddings.

What I hope is becomming clear here is that you can pretty much attach embeddings as you fee fit. The original video mentions GloVe because it was used in a benchmark but you can pretty much attach any features as long as you keep it compatible with the Rasa API.

After featurization we have the DIET model. It can take the features from the tokens as well as the entire sentence to predict intents/entities.

Now, just to emphesize on an example. Let’s talk about how these components interact with eachother.

The way the pipeline passes information along is via a Message processing system. A message is like a dictionary that changes as components process them.

Because components keep adding/replacing information you can easily attach extra models. Typically you’d add extra entity extractors that are specialized towards a certain task. Let’s say you’re using the RegexEntitiyExtractor to attach intents via a name-list. Then the message might expand like so:

If you’re interested in exploring what’s happening more directly, you might like to play around with the Printer object from rasa-nlu-examples. It’s documented here and it gives information about the Message. An example is shown below.

    'text': 'rasa nlu examples',
    'intent': {'name': 'out_of_scope', 'confidence': 0.4313829839229584},
    'entities': [
            'entity': 'proglang',
            'start': 0,
            'end': 4,
            'confidence_entity': 0.42326217889785767,
            'value': 'rasa',
            'extractor': 'DIETClassifier'
    'text_tokens': ['rasa', 'nlu', 'examples'],
    'intent_ranking': [
        {'name': 'out_of_scope', 'confidence': 0.4313829839229584},
        {'name': 'goodbye', 'confidence': 0.2445288747549057},
        {'name': 'bot_challenge', 'confidence': 0.23958507180213928},
        {'name': 'greet', 'confidence': 0.04896979033946991},
        {'name': 'talk_code', 'confidence': 0.035533301532268524}
    'dense': {
        'sequence': {'shape': (3, 25), 'dtype': dtype('float32')},
        'sentence': {'shape': (1, 25), 'dtype': dtype('float32')}
    'sparse': {
        'sequence': {'shape': (3, 1780), 'dtype': dtype('float64'), 'stored_elements': 67},
        'sentence': {'shape': (1, 1756), 'dtype': dtype('int64'), 'stored_elements': 32}

Let me know if this helps or if there’s still gaps in your knowledge.

1 Like

Thank you so much for this in-depth explanation, this really cleared everything up, I really appreciate it!

1 Like

Grand. I’ll consider that a first review. This should become a blogpost that will go live in a week or so.

Awesome, please let me know when it’s available, I’ll gladly take a look :slight_smile: