Entities not being extracted correctly

Hi,

I have a csv file with nearly 7000 names and 7000 student_IDs (in the format ex: OG69038). My model has two entities name and student_id. I used 10 names and 10 student_IDs from the data in the NLU training data and trained the model. I used “pretrained_embeddings_spacy” in the pipeline.

After training the model, it is predicting the name also as student_ID entity. Can some one help me with this? and also is there any way where I can train the NLU such that it can extract both the entities correctly?(for all the 7000)

So you only used 10 of each for training? It will do better at predicting if you provide more than that. Also, if all student IDs are of the format OG#####, the RegexFeaturizer might be helpful. Also, are you putting any sort of sentences or just the words/student ids in your training data?

Hi, Thanks for your response. I added more examples for student id and also added regex but still it is not able to recognize student id as an entity which is not in lookup table but of the same format.

Hm, when this happened

still it is not able to recognize student id as an entity which is not in lookup table but of the same format.

What was your input? Can you share input and output as you did above? Also what does your pipeline look like?

This is my input and output

and this is my pipeline