CRF with dense features

imene_tar · June 16, 2021, 9:28am

Hello,

I am trying to implement an entity extraction model using the CRF extractor. To do so, I have made 2 configurations: one with only sparse features and another one that takes dense features (BERT) as well.

Here are the configurations :

With sparse features only:

pipeline:
- name: WhitespaceTokenizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer analyzer: char_wb min_ngram: 1 max_ngram: 4
- name: CRFEntityExtractor
With pre-trained embedding:

pipeline:
- name: HFTransformersNLP model_weights: “bert-base-multilingual-uncased” model_name: “bert”
- name: LanguageModelTokenizer
- name: LanguageModelFeaturizer
- name: CRFEntityExtractor

The problem is that I have the exact same results as if the dense features are not being considered at all while I can see it training in the train phase. The same happens when I train it on a different dataset.

The rasa version that I’m using is 2.5.1.

merveenoyan · June 16, 2021, 2:04pm

This is weird. Couple of questions:

Do you get same results with cross-validation?
Does the bot make the same mistakes?
What is the bot’s domain? If you have a narrow domain it’s better to go supervised, LMs won’t be doing much difference.

“The problem is that I have the exact same results as if the dense features are not being considered at all while I can see it training in the train phase.” How did you observe it?

imene_tar · June 16, 2021, 2:42pm

Thank you for your reply.

I do not use cross-validation as I want to compare models using the same test dataset.
The exact same mistakes are being made with and without LM.
The bot goal is to fill administrative forms so the entities are quite general like name, surname, age, profession… I do not agree about LM not doing much difference as I can see the difference when using the DIET config as an entity extractor. I don’t know if I can conclue that LM do best when passed through a transformer but I think that it’s quite wierd that I am having the exact same results with CRF.

I’m observing the result using the CRF_entity_extractor report and errors.

liaeh · June 23, 2021, 1:43pm

Hi! I am having the exact same issue.

imene_tar · June 23, 2021, 2:22pm

Hello @liaeh ! From what I understood (I am not sure if that is really the case), but CRF takes into account only sequence features while the languageModelFeaturizer return both sentence and sequence features.

According to the documentation, the classifier decides wich kind of features to use. As the CRF is a sequence model, it should use sequence features only but yet as the results shows it fails to do that.

What I did, and it seems to be working is that I used the entity extraction of the DIET classifier and put the transformer layer to 0. That way, the configuration is almost the same as the CRFentityExtractor except the forward neural networks between the features and the CRF. As the DIET architecture takes into consideration the language model features, it’s a way to trick the CRF into using them as well.

Hope that this answers your problem.

liaeh · June 23, 2021, 2:25pm

Hi @imene_tar, thanks for the answer! Your workaround is helpful.

Still, as you say it seems like the documentation is incorrect: CRFEntityExtractor is supposed to include all dense features of len(tokens), which is is not I have opened a GitHub issue about this and linked our forum posts - hopefully it helps with a fix

imene_tar · June 23, 2021, 2:29pm

Looking forward to read their replies.

koaning · June 25, 2021, 9:33am

I think I’ve found the issue. It’s explained in more detail on the GitHub issue here.

imene_tar · June 25, 2021, 12:23pm

If the problem is the tokenizer then why is it working well when using DIET? Did you try your configuration without using DIET at all?

koaning · June 25, 2021, 1:15pm

The DIET classifier in my example is set to only predicts intents. It doesn’t predict entities.

imene_tar · June 25, 2021, 1:22pm

Yes, but when using DIET as en entity extactor with the exact same tokenizer, the language model is taken into consideration while it’s not the case when using CRF only. (As I stated previously, I think that DIET knows which features to choose that’s what the CRF is working in your case)

koaning · June 25, 2021, 1:27pm

The difference is caused by the WhitespaceTokenizer though. I just ran the same configurations but without DIET and there I also seen two difference confusion matrices.

imene_tar · June 25, 2021, 1:51pm

What I meant was that when using the LanguageModelTokenizer with DIET, the model is perfectly working so I don’t understand why it should cause any problem in CRF.

Anyway, I have replaced it with the whitespacetokenizer as you suggested and I still have the exact same result with and without the language model.

koaning · June 25, 2021, 2:04pm

That’s … odd. Could you share your full nlu data?

imene_tar · June 25, 2021, 2:14pm

train.yml (5.7 KB) test.yml (5.8 KB) domain.yml (591 Bytes)

Here are the training and the test data as well as the domain.

Thanks again for your help.

imene_tar · June 28, 2021, 6:09am

Hi Vincent! Were you able to test your model on my datasat? I am curious to know the nature of the problem.

koaning · June 28, 2021, 6:42am

I just ran my two configurations against your data and I can again confirm that the predictions from the CRFEntityExtractor are different. You can explore both the confusion matrix and the configurations by expanding below.

Base Config

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: CRFEntityExtractor

Language Model Config

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: LanguageModelFeaturizer
    model_name: "roberta"
    model_weights: "roberta-base"
  - name: CRFEntityExtractor

Since the only difference between the two models is the LanguageModelFeaturizer component I must conclude that the CRFEntityExtractor is indeed picking up the features. I also notice that the second configuration is a fair bit slower to run, again indicating that there are features being picked up.

koaning · June 28, 2021, 6:43am

Ah wait. I may have found a difference. I am running my benchmark via cross-validation.

rasa test nlu --config config.yml --cross-validation --runs 1 --folds 2 --out gridresults/basic-config-nodiet-french
rasa test nlu --config config-lm.yml --cross-validation --runs 1 --folds 2 --out gridresults/basic-config-lm-nodiet-french

You may be running against a test-set. Lemme try that real quick.

koaning · June 28, 2021, 7:01am

When I train both systems I see a difference in training time.

> time rasa train nlu --config config.yml --fixed-model-name config-base
3.66s user 3.07s system 139% cpu 4.842 total

> time rasa train nlu --config config-lm.yml --fixed-model-name config-lm
24.26s user 3.96s system 244% cpu 11.536 total

When I run the test command I see the same pattern.

> time rasa test nlu --model models/config-base.tar.gz
5.29s user 4.41s system 189% cpu 5.111 total

> time rasa test nlu --model models/config-lm.tar.gz
38.32s user 6.11s system 205% cpu 21.620 total

So there’s definitely a difference between the two configurations just by looking at the time it takes. The reported confusion matrices are the same from both of these test runs because every entity was predicted correctly by the model.

koaning · June 28, 2021, 7:08am

One thing to keep in mind, it’s possible that the CRF model ignores the BERT features simply because it doesn’t need them.

When a token is featurized, it creates both sparse features and dense features. If the sparse features are able to cause a perfect prediction, the model will be able to ignore the dense features. This is possible.

I will investigate further later today though to confirm if I can spot a difference between confidence values.

Topic		Replies	Views
No Difference in Performance when Using or Changing Language Model Featurizers Rasa Open Source	3	1263	January 17, 2022
Difference between CRF_entity_extractor and DIET Tutorials, Resources & Videos entity	0	672	June 17, 2021
ValueError: Sequence dimensions for sparse and dense features don't coincide Rasa Open Source	23	2028	February 11, 2020
Ner_crf Rasa Open Source	12	5141	September 28, 2018
Using NER as a Feature for CRFEntityExtractor Rasa Open Source	6	1730	June 28, 2021

CRF with dense features

Related topics