Position title like "Javascript Technical lead" getting predicted as "Javascript" skill and "Technical Lead" as position

Hi, I am building a chatbot to support candidate searches. Candidates can be searched by their name, skills, city, college, position they applied for, etc. The model is predicting position titles like “Javascript Technical Lead” as “Javascript” skill and “Technical Lead” as position. Ideally, any language model should understand that the entire “Javascript Technical lead” is referring to the same thing until they are separated by a comma or else. I tried bert, gpt2, and Roberta but non seems working. I have lookup tables for all these entities. Please suggest a solution for this.

Sounds like you don’t have enough entity examples in your training data.

Hi Stephens, thanks for your reply. I have added many examples like [Javascript]{“entity”: “skill”} and [Technical Lead]{“entity”: “position_title”} and [Javascript Technical Lead]{“entity”: “position_title”}. But the model works well only for the specific trained examples and is not able to generalize it. Similarly for messages like “search candidates from Sydney Institute”, “Sydney” gets predicted as a city where as the user has specifically mentioned ‘Sydney Institute’. So overall it looks like the pre-trained model is not able to understand the basics of English.

Looks like you’re confusing the model with these examples.

Yes, this might be confusing but isn’t the data right. I also have a scenario of [massachusetts]{“entity”: “city”} and [massachusetts institute]{“entity”: “institute”} but model gets confused and predicts [massachusetts institute] as city.

Yes, that’s a challenge. You may have to use a custom action to clean-up the entity extraction results but I would also look at creating a custom spacy extractor.

Take a look at this custom spacy entity extractor project that Vincent created. He creates a custom entity extractor for Programming Language names.