eveythings OK - dont worry! Since most of our own datasets are compliance-secured, I couldn’t use one of those. I needed a free one and saw that the DeepSet team used the same GNAD for evaluating their german pretrained BERT - so I decided to “missuse” it.
Of course I can do that. As soon as I realized that I won’t be able to use the finetuned BERT-spaCy model in rasa for e.g. extracting entities like PERSON (in fact, duckling is currently not able to do that), I thought about how this would be done in general:
Use the SpacyFeaturizer and SpacyEntityExtractor which currently would be recommended but which is not possible due to manual effort on the side of BERT (as mentioned, I am working on that).
Finetuning the pretrained BERT that afterwards is converted into a spaCy-compatible model on any NER dataset is absolutely possible and intended. We can finetune the BERT on both tasks alongside. If so, the model contains everything we are going to need to derive entities from it. Currently just not with spaCy directly. Instead we could use a CustomBERTEntityExtractor which loads the model that the pipeline already has loaded and do the work, that spaCy is currently not “able” to do.
Since 2 seems to be an overhead at least for the moment, why not do the following:
- name: SpacyNLP
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: SklearnIntentClassifier
- name: SpacyNLP
- name: RegexFeaturizer
- name: CRFEntityExtractor
- name: DucklingHTTPExtractor
dimensions: ['time', 'duration', 'email']
- name: SpacyEntityExtractor
dimensions: ['PER', 'LOC', 'CARDINAL']
- name: rasa_mod_regex.RegexEntityExtractor
- name: EntitySynonymMapper
This pipeline will then load and use the features of
SklearnIntentClassifier, and the features of
This is not a neat solution and it should only be used until there is a smarter way (1,2) but it works.
It should be mentioned, that of course you are able to finetune even the
de_core_news_md model of spaCy or train your own.
Did that help you?