Hi, below is my config file:
config = “”" language: “en”
pipeline:
-
name: “SpacyNLP” # loads the spacy language model model: “en_core_web_md” case_sensitive: false
-
name: “SpacyTokenizer” # splits the sentence into tokens
-
name: “SpacyFeaturizer” # transform the sentence into a vector representation
-
name: “SklearnIntentClassifier” # uses the vector representation to classify using SVM
-
name: “RegexFeaturizer”
-
name: “CRFEntityExtractor” features: [ [“low”, “title”, “upper”,“prefix2”,“suffix2”], [“bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, “pattern”], [“low”, “title”, “upper”,“prefix2”,“suffix2”] ]
-
name: “EntitySynonymMapper” # trains the synonyms “”"
I have now provided almost 300 training examples in my intents. NER is able to identify the entity on which i have already trained but it is not generalizing to entity that are not in training data. for example:
Let say i have training data as:
- How do i get number of issue for ABC
- How do i get number of students in XYZ
- How many number of students are in NOP
I have almost 300 training examples as such. Model is able to predict the [ABC],[XYZ] and [NOP] but it doesn’t recognizes [DEF] as dept. I don’t feel the answer to this query shall be “Include more training examples” Because the CRF model of CORENLP starts generalizing to these entities with very less amount of training data.
P.S. : These are just the sample names i have used. Can anyone provide any thoughts on this? @akelad @erohmensing