Following is the pipeline am using to train NER model
Configuration Pipeline:
language: “en” pipeline:
- name: “SpacyNLP”
- name: “tokenizer_spacy”
- name: “ner_crf”
- name: “ner_synonyms”
- name: “CRFEntityExtractor”
- name: “intent_entity_featurizer_regex”
That’s how I add my lookup table of technical skills which contain words. Training data:
{
"rasa_nlu_data": {
"common_examples": [{[... ],
"lookup_tables": [{
"name": "technical_skills",
"elements": "data/tech_skills_lookup/technical_skills.txt"
}]
Regex model after training component_5_RegexFeaturizer.pkl
[
{
"name": "technical_skills",
"pattern": "(?i)(\\bpytorch\\b|\\br\\b|\\bmachine\\ learning\\ frameworks\\b|\\bsentiment\\ analysis\\b|\\bdata\\ structures\\b|\\bn\\-grams\\b|\\bpython\\b|\\btext\\ representation\\ techniques\\b|\\bjava\\b|\\bkeras\\b|\\bbag\\ of\\ words\\b|\\bsemantic\\ extraction\\ techniques\\b|\\bmodeling\\b|\\bbig\\ data\\b|\\bibm\\ cloud\\b|\\bamazon\\ alexa\\b|\\bmicrosoft\\b|\\bc\\#net\\b|\\bnodejs\\b|\\bgoogle\\ dialogflow\\b|\\bibms\\ watson\\ conversation\\ service\\b|\\bamazon\\b|\\bazure\\b|\\bpython\\b|\\b\\b|\\bscala\\b|\\bapache\\ lucene\\b|\\bapache\\ spark\\b|\\bnumpy\\b|\\bcorenlp\\b|\\bcomputer\\ science\\b|\\bapache\\ opennlp\\b|\\btextblob\\b|\\bmllib\\b|\\bspacy\\b|\\bscikit\\-learn\\b|\\bgensim\\b|\\bsolr\\b|\\bpandas\\b|\\bpython\\b|\\bnltk\\b|\\bscipy\\b|\\bglove\\b|\\bkeras\\ pytorch\\b|\\bmachine\\ learning\\b|\\btensorflow\\b|\\br\\b|\\bword2vec\\b|\\bmathematics\\b|\\bdata\\ cleaning\\b|\\bwrangling\\b|\\brnn\\b|\\bforecast\\ modeling\\b|\\bword\\ embedding\\b|\\btensorflow\\b|\\bkeras\\b|\\bsequence\\ modeling\\b|\\bcnn\\b|\\bfeature\\ engineering\\b|\\bnips\\b)"
}
]
But I don’t know why it does not able to recognize this ? Example code output:
interpreter.parse("nips")
{'intent': {'name': None, 'confidence': 0.0}, 'entities': [], 'text': 'nips'}
Seems like look up tables are working although they are the part of my model. Strange behavior!
Please help me out