Even though the blog post is aimed primarily at ML researchers and practitioners, the topic is very much relevant to everyone who wants to use today’s best-performing language models for tasks like intent classification.
If you’ve got any questions, ideas, comments, post them here!
Will your next post be on pruning etc, I would love to see how I can implement this directly into my Rasa tests. I have been learning a lot from directly using Rasa, fixing issues, learning as I go. So it would be cool to see how I can utilize this
I have tried weight pruning and am currently exploring neuron pruning. The results look promising so far, though any inference time improvements are still to be measured. If you’re not afraid of dirty code, you can just watch my branch of the rasa repo for live updates
What is your motivation when it comes to model compression? Do you just want to see if big models can be made faster, or is it something else that’s driving your interest?
maybe you might want to think about this article from spacy in which distillation is properly explained. I’ve made really good experiences with packaging a e.g. finetuned BERT model to a spaCy model which could problem-free be imported into rasa, e.g. by using the following pipeline:
In addition to @JulianGerhard’s advice, I should say that a model pruning blog post is coming soon. I am also making the code for a BERT-based intent classifier (which supports weight/neuron pruning) significantly easier to use.
That being said, Rasa probably won’t officially ship model compression techniques in the near future, but the code I wrote should serve as a good, re-usable example in case you want to apply quantisation, weight pruning or neuron pruning to a model.
What kind of model are you trying to compress, @cyrilthank?
I have created a bert_spacy_rasa repo in which I describe a quick dive into the matter. The distillation part however needs to be added and maybe we could collaborate at this point.
I keep this repo pretty much up-to-date so if you want to share the code @SamS I can use it.
My idea behind this repo was to make life easier for those, who want to follow the track and don’t want to or don’t have the time to puzzle everything together.
Thanks @JulianGerhard looks like you really want to embarrass me
But here goes…i see your dataset is for category wise classification.
can you please share any similar datasets (not necessarily within the classification domain) but in Entity/Intent recognition ‘framework’ where we may try this?
Requesting since i am assuming from the below this is for a bot which is classifiying text while i am stuck with Entity/Intent recognition dataset issues
POST
{
"text": "<any article you want to get its domain for>"
}
correct - this dataset is not optimal since it cannot directly be seen as a conversational AI related dataset but it fitted my needs:
It is large enough to be representative for a finetuning evaluation task
It has only a few classes which means that if a class is seen as an “intent” in rasa, it can easily be integrated since there is only minor manual effort necessary
Imagine the bot as a classifier… you want to know the domain for a given article and the bot utters its domain - seems to be fair enough for the moment - doesnt it?
One thing is to be mentioned here: The current spaCy packaged models are not capable of providing directly extracted entities. It has something to do with the finetuning format for entity-transfer-learning. I am currently working on this and will update the repo asap.
I have solved that by using two models in the same pipeline for rasa.
If this doesnt answer your question, please describe a bit more detailed what you want to achieve!
eveythings OK - dont worry! Since most of our own datasets are compliance-secured, I couldn’t use one of those. I needed a free one and saw that the DeepSet team used the same GNAD for evaluating their german pretrained BERT - so I decided to “missuse” it.
Of course I can do that. As soon as I realized that I won’t be able to use the finetuned BERT-spaCy model in rasa for e.g. extracting entities like PERSON (in fact, duckling is currently not able to do that), I thought about how this would be done in general:
Use the SpacyFeaturizer and SpacyEntityExtractor which currently would be recommended but which is not possible due to manual effort on the side of BERT (as mentioned, I am working on that).
Finetuning the pretrained BERT that afterwards is converted into a spaCy-compatible model on any NER dataset is absolutely possible and intended. We can finetune the BERT on both tasks alongside. If so, the model contains everything we are going to need to derive entities from it. Currently just not with spaCy directly. Instead we could use a CustomBERTEntityExtractor which loads the model that the pipeline already has loaded and do the work, that spaCy is currently not “able” to do.
Since 2 seems to be an overhead at least for the moment, why not do the following:
This pipeline will then load and use the features of de_pytt_bertbasecased_lg_gnad for SklearnIntentClassifier, and the features of de_core_news_md for SpacyEntityExtractor.
This is not a neat solution and it should only be used until there is a smarter way (1,2) but it works.
It should be mentioned, that of course you are able to finetune even the de_core_news_md model of spaCy or train your own.
Definitely this is helping me think through this better.
Sorry i am getting spoilt here by your ‘instant replies’ but it is helping me think through this.
Question: From a pipeline/workflow perspective.
if i were to fine-tune BERT for Entity Recognition/Intent classification using domain-specific news data (ie for example the data is already picked up from weather site and we dont need to classify weather news separately)
What would the steps look like for with spaCy and without spaCy
Please feel free to respond later and not feel rushed
this can’t be answered in one single reply. I assume that you are familiar with the history of word embeddings and what to do with them so I will skip that part. If you know what they are capable of, then you should ask yourself: Do I need their advantages? I am not really sure about your use case but I’ll try:
Yes, you would be able to finetune BERT on a domain specific news-data set, if there is enough data to let BERT learn from it. This can be done either only by finetuning BERT (there are several very good scripts on the HuggingFace repo for that) or by doing it with the spacy-pytorch-transformers library. The latter will allow you to follow the steps described in my repo.
The second question is about what you want to do with that finetuned BERT. If you want to use it as a classifier, you have two choices: You could either train/use the finetuned BERT as a classifier directly (e.g. following this one). Or you could provide its features to the next algorithm that could use them, e.g. by packaging it with spaCy and e.g. by using a supervised embeddings config from rasa. If you want to use it for entity extraction its pretty much the same: Either use it directly or use it in a spaCy pipeline - the caveats of this approach are described in my repo and here.
If I may cite you:
“(ie for example the data is already picked up from weather site and we dont need to classify weather news separately)”
I don’t quite get this part but it seems you want to pick that weather data and extract “entities” from it?
I want to
a. use weather data
b. fine-tune it into bert using the ‘spacy route’ you mentioned in point 1 of your answer
c. so that rasa can pick up ‘oh it is hot’ as a ‘weather_intent’ and not as a ‘spice_intent’
Can you please advise based on your extensive experience what may be the steps (from a pipeline/workflow perspective) to achieve that?