I have a Japanese robot trained with CRF NE model. I works well on evaluation of a split of training-testing data. However, I tested a few cases with new names to replace the existing names in the training data, it generalize little. Examples:
[福原愛]に電話をかけてください。// I replace '福原愛' with another name '三口百惠',and it didn't work
[消防署]に電話。// I replace '消防署' with '消防', and it didn't work.
Does the ‘name’ has to occur at least once in the training data?
No, not every entity has to appear to in the training data. The question how will it generalizes depends on how strict your pattern is, if the same keywords appear and so on.
Hey @twittmin. Yes, intent_entity_featurizer_regex was renamed to RegexFeaturizer. Yep, you should have this component if you are using lookup tables, because it’s one of the components which are used to extract the patterns.