Is there a good approach to leveraging both the pre-trained power of spaCy and the flexibility/trainability of a CRF model?
Say you use ‘ner_spacy’ for entity extraction to reasonably good effect in most cases. But then you have case-specific phrasing that is handled poorly by spaCy, and the only way to get good performance is to train a custom entity extractor, e.g. ‘ner_crf’. Ideally, I would like to just generate relevant training data (using Chatito) that covers the problematic phrases/entities, and stick the ‘ner_crf’ component in the pipeline to cover those. What would be the best way to do this though?
I’m mainly worrying about whether/how to disambiguate when spaCy classifies as one entity and CRF as another. Or should the full information be left to Core to decide what to do? I’m currently not using Core yet (but I plan to), and for my dialogue handler, it’s certainly much better if there aren’t conflicting entities in the NLU output.
I’ve thought about creating a MetaEntityExtractor trainable component that picks an entity from the extracted ones, using as features the entity type, entity extractor, extraction confidence, etc… A simpler version of this would be to just set a hard confidence threshold on the CRF output (e.g. 70%) and if the confidence is lower, go with the spaCy prediction. Any thoughts on this?
I have the same question as I struggle to understand how to use any result provided by ner_spacy in Rasa Core. Core seems to use ner_crf for filling out slots and I haven’t found any documented way how to access ner_spacy result and fill a slot with it.
Ideally you should use the same entity name spaCy provides as your slot names. for e.g if spacy has an entitty say LOC, your slot name should be LOC as well.
If you have the same entity name for your CRF and spaCy then you are looking at some custom implementation of your extraction technique( take threshold as an example) but i am not sure why you should have the same entity extracted by spaCy and CRF … plus for spaCy entity extraction is based on pre-trained vectors so labelling them in your training data has no real value
That actually works. I created a slot PERSON and now I am getting names in there as I needed. The idea was to use spaCy for registration infromation without being forced train all the names, locations and so on to CRF while I already have perfect database from spaCy.
so let’s take an example, based on the restaurant bot for simplicity. Assume that spaCy extracts the LOC entity very well in most cases, e.g. in the sentence “find me a chinese restaurant in Paris”. But then you find that your users ask in a slightly different way, that spaCy misclassifies LOC as PERSON for some reason: “find me a Paris chinese restaurant” (not very realistic but it makes the point). What I want is to train an entity model to improve the recognition of the location in this specific instance. I’m sure there are multiple approaches to this, which is why I posted the question.
My idea was to train a CRF model with training data that addresses the problematic phrases, and then write a component that decides whether to go with the spaCy or CRF prediction (in the cases where they differ).
I guess another idea (if using Rasa Core) would be to give the CRF entities different names (e.g. LOC_crf), and just include slots for both LOC and LOC_crf (do Core policies take into account the confidence of an extracted entity?)
If you want to improve the entity extraction of what spaCy’s extracts, you can do the improvement on spacy as well. Spacy has good documentation on generating your own language model and add new entities as well. I would avoid confusing the same entities between custom entity extraction and spacy because you will need a lot of training data to do so.
you can try the ner with the tensor flow pipeline which is spaCy independent.
Another easy way for location extraction would be to use a phrase matcher in spaCy as well.
Location and Person is famously notorious being Paris is also a name( Paris Hilton)
Yes, there are definitely frequent ambiguities in NER… that’s what makes it an interesting problem
I don’t agree with going down the route of fine-tuning / training from scratch the spaCy NER, that really requires a lot of data (if I remember correctly, ~5000 examples per entity type according to the creator), and moreover, if you start with the pre-trained model and want to avoid catastrophic forgetting, you need to use large amounts of pretty general training data in addition to the new case-specific data.
CRF doesn’t seem to need a lot of training data on the other hand. The question is just whether to train it with data annotated as LOC_crf and PERSON_crf, and then let Core deal with the implications of having two different entities that really refer to a “location”, and thus should have a similar effect on the dialogue flow. Or, use LOC and PERSON and then have a separate NLU component that decides whether to go with the spaCy or CRF prediction (probably mostly dependent on the CRF confidence value). Currently I’m not using Core, so I prefer the latter.
I think phrase matching is most useful if you have a fixed-length list of case-specific, non-generic entities that you want to extract? I haven’t really used it so far.
Your second idea is better in order to find the same entity using two different extractors and check which has better confidence. I am interested in your approach as well. we use the LOC entity as well but mostly to detect countries. So I plan to use the phrase matcher pipeline since there are a limited number of countries.
Persons are really difficult but I am not sure why you need it. If it is not open domain chatbot , I would not deal with names with a predictive model but rather use form control. You should take cultural sensitivity into account.
Not sure if you have already found a better solution to this–if so, I’m interested to know. Particularly regarding the PERSON’s names entity recognition.
I found something interesting in the sample code of rasa-demo bot, in the actions.py file, line 339, they have created this “action_store_entity_extractor”, which apprently, """Takes the entity which the user wants to extract and checks what pipelines can be used. """
according to the docstring. I haven’t tried setting this up yet with mine but it sounds like a great solution. NER is such an important tool for my project.
Has anyone tried to customize this action to solve this “leveraging spacy and CRF” in their assistant?