NER_CRF generalizes very badly

It is really desperate to train NER_CRF even for one entity.

I start training by adding sentence for sentence where NER fails and filling it with different entity values. I get the following results:

Ading just one examples infers with other examples such other examples are not recognized anymore. Also, even testing on examples which are exactly in training data fails sometimes.

I feel, I have to add every exact sentences structure to data. NER_CRF does not learn to mix context words so examples where you have two context words which are in two seperate training examples.

Even If I have all subparts of a sentence in training data and testing it on a example containing two those parts it fails…

I use as features:

 ["prefix5", "prefix2", "suffix3",
             "suffix2", "title", "upper"],
            ["bias", "upper", "title", "digit", "pattern"],
            ["prefix5", "prefix2", "suffix3",
             "suffix2", "title", "upper"]],

Any help?

Could you post your training data so we can see what’s going wrong there?

I think those are typical situation like most have. I just like some advices. This should be possible somehow. I read everytime the request for training data. But I think there should be some good guidelines when somone has experinece in this. I don’t want to make the data public.

What I think is, that for my config, some statistics about context occurences are important to know?

And, I stated an issue that even examples from training data are not recognized on same exatc examples. How can that be. Also learning mixtures of different context words is really difficult! and, adding a new examples infers negatively with previous examples…

Well, there’s not really any general guidelines for this… It depends what your training data looks like, what kind of entities there are, how much of it there is etc etc

If you don’t want to share your training data here, we’d be happy for you to send it to hi@rasa.com so we can take a look (confidentially of course)

@akelad thanks. I think about it. But, really, above issues are still common and of course they are from overfitting. But it is really hard to find out how much variation CRF needs to generalize well! As I can read many have those troubles. I have the trouble that almost both context words have to coincide with the trained example (above config using just context properties), Also entities at the end of sentences are not recognized well even if same is in training data…

I did some thought on that. I feel the only solutionis to build a more elaborate evaluation system which does something like backward fitting (removing each example) until best training data is found…

If this is something you want to work on implementing, then we’d love it if you shared it with us

Is it possible to access the weights of learned features for CRF? I think this would be great to see where is an overfitting and which examples to include?

feel free to look at the code, i think you should be able to