CRF with dense features

Hello,

I am trying to implement an entity extraction model using the CRF extractor. To do so, I have made 2 configurations: one with only sparse features and another one that takes dense features (BERT) as well.

Here are the configurations :

  • With sparse features only:

    pipeline:

    • name: WhitespaceTokenizer
    • name: LexicalSyntacticFeaturizer
    • name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
    • name: CRFEntityExtractor
  • With pre-trained embedding:

    pipeline:

    • name: HFTransformersNLP model_weights: “bert-base-multilingual-uncased” model_name: “bert”
    • name: LanguageModelTokenizer
    • name: LanguageModelFeaturizer
    • name: CRFEntityExtractor

The problem is that I have the exact same results as if the dense features are not being considered at all while I can see it training in the train phase. The same happens when I train it on a different dataset.

The rasa version that I’m using is 2.5.1.

1 Like

This is weird. Couple of questions:

  • Do you get same results with cross-validation?
  • Does the bot make the same mistakes?
  • What is the bot’s domain? If you have a narrow domain it’s better to go supervised, LMs won’t be doing much difference.

“The problem is that I have the exact same results as if the dense features are not being considered at all while I can see it training in the train phase.” How did you observe it?

Thank you for your reply.

  • I do not use cross-validation as I want to compare models using the same test dataset.
  • The exact same mistakes are being made with and without LM.
  • The bot goal is to fill administrative forms so the entities are quite general like name, surname, age, profession… I do not agree about LM not doing much difference as I can see the difference when using the DIET config as an entity extractor. I don’t know if I can conclue that LM do best when passed through a transformer but I think that it’s quite wierd that I am having the exact same results with CRF.

I’m observing the result using the CRF_entity_extractor report and errors.

Hi! I am having the exact same issue.

Hello @liaeh ! From what I understood (I am not sure if that is really the case), but CRF takes into account only sequence features while the languageModelFeaturizer return both sentence and sequence features.

According to the documentation, the classifier decides wich kind of features to use. As the CRF is a sequence model, it should use sequence features only but yet as the results shows it fails to do that.

What I did, and it seems to be working is that I used the entity extraction of the DIET classifier and put the transformer layer to 0. That way, the configuration is almost the same as the CRFentityExtractor except the forward neural networks between the features and the CRF. As the DIET architecture takes into consideration the language model features, it’s a way to trick the CRF into using them as well.

Hope that this answers your problem.

1 Like

Hi @imene_tar, thanks for the answer! Your workaround is helpful.

Still, as you say it seems like the documentation is incorrect: CRFEntityExtractor is supposed to include all dense features of len(tokens), which is is not :confused: I have opened a GitHub issue about this and linked our forum posts - hopefully it helps with a fix :slight_smile:

Looking forward to read their replies.

I think I’ve found the issue. It’s explained in more detail on the GitHub issue here.

If the problem is the tokenizer then why is it working well when using DIET? Did you try your configuration without using DIET at all?

The DIET classifier in my example is set to only predicts intents. It doesn’t predict entities.

Yes, but when using DIET as en entity extactor with the exact same tokenizer, the language model is taken into consideration while it’s not the case when using CRF only. (As I stated previously, I think that DIET knows which features to choose that’s what the CRF is working in your case)

The difference is caused by the WhitespaceTokenizer though. I just ran the same configurations but without DIET and there I also seen two difference confusion matrices.

What I meant was that when using the LanguageModelTokenizer with DIET, the model is perfectly working so I don’t understand why it should cause any problem in CRF.

Anyway, I have replaced it with the whitespacetokenizer as you suggested and I still have the exact same result with and without the language model.

That’s … odd. Could you share your full nlu data?

train.yml (5.7 KB) test.yml (5.8 KB) domain.yml (591 Bytes)

Here are the training and the test data as well as the domain.

Thanks again for your help.

Hi Vincent! Were you able to test your model on my datasat? I am curious to know the nature of the problem.

I just ran my two configurations against your data and I can again confirm that the predictions from the CRFEntityExtractor are different. You can explore both the confusion matrix and the configurations by expanding below.

Base Config
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: CRFEntityExtractor

Language Model Config
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: LanguageModelFeaturizer
    model_name: "roberta"
    model_weights: "roberta-base"
  - name: CRFEntityExtractor

Since the only difference between the two models is the LanguageModelFeaturizer component I must conclude that the CRFEntityExtractor is indeed picking up the features. I also notice that the second configuration is a fair bit slower to run, again indicating that there are features being picked up.

Ah wait. I may have found a difference. I am running my benchmark via cross-validation.

rasa test nlu --config config.yml --cross-validation --runs 1 --folds 2 --out gridresults/basic-config-nodiet-french
rasa test nlu --config config-lm.yml --cross-validation --runs 1 --folds 2 --out gridresults/basic-config-lm-nodiet-french

You may be running against a test-set. Lemme try that real quick.

When I train both systems I see a difference in training time.

> time rasa train nlu --config config.yml --fixed-model-name config-base
3.66s user 3.07s system 139% cpu 4.842 total

> time rasa train nlu --config config-lm.yml --fixed-model-name config-lm
24.26s user 3.96s system 244% cpu 11.536 total

When I run the test command I see the same pattern.

> time rasa test nlu --model models/config-base.tar.gz
5.29s user 4.41s system 189% cpu 5.111 total

> time rasa test nlu --model models/config-lm.tar.gz
38.32s user 6.11s system 205% cpu 21.620 total

So there’s definitely a difference between the two configurations just by looking at the time it takes. The reported confusion matrices are the same from both of these test runs because every entity was predicted correctly by the model.

One thing to keep in mind, it’s possible that the CRF model ignores the BERT features simply because it doesn’t need them.

When a token is featurized, it creates both sparse features and dense features. If the sparse features are able to cause a perfect prediction, the model will be able to ignore the dense features. This is possible.

I will investigate further later today though to confirm if I can spot a difference between confidence values.