How does the CRF for entity extraction work? Best practices?

I think best practices documents are very helpful and wasn’t able to find them for Rasa entity extraction. To know best practices, it is often helpful to know more details about the algorithms. I’ll ask various questions here that I think could be relevant for best practices.

I think there are two CRFs (conditional random fields) one might care about in Rasa: the one in DIET and CRFEntityExtractor. I’ll assume that DIET roughly uses the same CRF as CRFEntityExtractor.

  1. What kind of CRF is used (machine learning model and graphical model/assumed independencies)?
  2. For a given entity, what does the CRF take into account (what is conditioned on)? For example, a linear chain CRF would only take into account the tokens on either side of the entity.
    1. In DIET, there’s a transformer before the CRF, so can it attend to any token (takes into account the whole input)?
    2. Does the CRF take into account the text of the entity and not just the tokens around it (seems like the answer should be “yes”, but I vaguely remember reading that it doesn’t somewhere on the forum)?
  3. Is there a recommended maximum number of entities?
  4. Is there a recommended maximum number of roles/groups (also implemented with a CRF) to use for a given entity?
  5. Any other best practices you have in mind that aren’t covered by these questions?

Bump. Anyone know who is the best (Rasa?) person to tag to answer this?

Maybe Vincent (@koaning) can help?

Also check out his awesome Algorithm Whiteboard playlist if you haven’t already!

1 Like

The CRF in the DIET algorithm is a CRF-layer from Tensorflow. It’s not really a standalone conditional random field, but for many intents and purposes, it acts like one. It can be seen as the final translation step between tensors going out of the transformer and the entity predictions.

The other CRF that Rasa provides is implemented by sklearn-crfsuite. It’s a standalone algorithm but under the hood it’s mainly a logistic regression that makes the predictions. The bulk of the learning that happens there is in the featurisation. It looks at properties of the surrounding tokens before making a prediction. These properties are also available to DIET because we’ve exposed them in the LexicalSyntacticFeaturizer.

1 Like

In terms of best practices for entities: it really depends.

My main advice is to consider that an implementation may already exist in duckling or spaCy.

1 Like

Perfect! For anyone else reading, this is a linear chain CRF, which means that if this were applied directly on top of the tokens, it would mean that only the given token and the two tokens that are adjacent to it would be used in the entity prediction. However, because in DIET there’s a transformer before the CRF and a few neural network layers before the transformer and gradient propagates from the entity loss through all of these layers, really entity recognition for one token should be able to take into account any part of the input, right?

1 Like

any part of the input

Could you clarify what you mean by “any part of the input”?

1 Like

Oops haha. By “input”, I don’t mean the whole training set or past messages in the conversation, just the most recent message: text string (e.g. a phrase, a sentence, or maybe even a few sentences) that the user gave at a single turn in the conversation.

1 Like

The NLU pipeline of Rasa only considers the current message and parses intents/entities out of it. It ignores the rest of the conversation.

There’s a distiction between entities and slots though. Entities may be used to fill in slot values, and slot values are the “state” in a conversation. These actually store values that can be referred to later.

1 Like

By “any part of the input”, I mean the current message. Because of the transformer and neural network layers before the linear chain CRF, entity extraction should be able to take into account any part of the current messsage, right?

1 Like



I gotta ask, what made you think it didn’t?

1 Like

I think the main thing that brought doubt into my mind is that I saw a Rasa team member saying that the entity extractor doesn’t really take into account the current token (current word for anyone using the default config with WhitespaceTokenizer) like it takes into account the tokens around it. This is somewhere on the forum, but I can’t remember which post. I’m guessing this Rasa team member was mainly thinking about the LexicalSyntacticFeaturizer that you mentioned or something else like it.

Then, in general, because I know I have a custom entity recognition case (can’t use duckling or spaCy’s pretrained entity extractors), it helps me to understand the algorithm (in this case, just the effective graphical model for the CRF) in order to better understand what I can expect of the entity extractor and how to best label data to help it do its best. But that’s probably part of why you make your great videos! :slight_smile:


I am working on a chatbot, where I have to extract entities from any input sentence by user. I have used CRFEntityextractor for extracting entity but I’m not able to get the entity for all sentences properly.

Any idea how to proceed?

Here’s my pipeline:

language: en

  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "CRFEntityExtractor"
  - name: DIETClassifier
    entity_recognition: False
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100