Has anyone tried using 5 words (2 preceding / 2 following) instead of 3 (1 preceding / 1 following) with the CRF model in rasa_nlu? I just tried it briefly when I had quite small training data, and the confidence dropped drastically. I guess it’s a matter of increasing the amount of training data as we have more parameters to train. Anyone have a feeling for how much training data is needed for a 5-word model to work better than a 3-word model?
I haven’t tried that but it sounds interesting. More data seems sensible. On the confidence, is that not to be expected? The probability of n-grams decreases with larger n. Isn’t the key thing whether the relative confidence is helpful within the CRF model output?
You’re right, the absolute value of the confidence may not mean much, as long as the correct entity has the highest confidence. I will be trying 5-word CRF models as I have different entities that can appear in identical 3-word contexts. Will try to remember to follow up here if I have any success.
@einar.bui How did you have the CRF model use 5 words instead of 3? was it a change the the configuration in the pipeline as follows? If so I’d love to give this a try on our data set and see what happens. I have 200+ examples for some of my entities and I’m curious if I’ll get a better model with a 5-word model.
- name: "ner_crf" features: [[first],[second],[third],[fourth],[fifth]]