I thought CRF without spacy would be very nice, but I found that the POS tag feature might be important. I played now a while for the new CRF. I have many entities not trained so I removed the low
feature because without POS feature this features has a too large impact such that new words are not detected. How should I choose the new set of features to get a good result for unknown entities? I don’t want to use this look up table right now because I have no list of entities yet. My domain is also too large for a proper look up table…
Any idea?
What are good features for NER with short sentences?