Are the lemmatized words used as features for NER_CRF?


(Datisto) #1

For german you have a lot of possible forms which makes training difficult. Like training

ein/eine/einer/eines in front of the entity

Are the trained words used as training or some lemmatized form?

So I would just like train

ein Hotel instead together with

eines Hotels einem Hotel

If just used the pure word you need for german train all posible grammar form, because otherwise entity will not extracted?

Where would I have to implement my own Stemmer?