Best way of extracting text entities from the user's input

Hello team! I’m trying to solve the following scenario. I would like to be able to extract job titles from the input entered by the user.

So if the user says: "I'm looking for a software engineering position". Then I would get “software engineering”. How can I achieve something like this? I don’t think there’s a module in duckling for this specific purpose.

Can regexes be enough? I can think of many different ways of saying the same thing and maybe use regexes to capture what’s in the middle of the sentence or in the end.

Let me know. Thanks.


You could try:

Hello @artemsnegirev, thanks for answering me back.

The problem with lookup tables is that they’re not scalable. Imagine that the world of job positions is extremely big, so I would need to be generating a huge list. I was looking for something a bit more general and scalable than that. Maybe the second option you mention is better?

What’s your opinion?


Thank you!

It really depends on your needs. Just my ideas:

If you can not create jobs positions list (open end list)

  • create your custom NER tagger (ask me questions)
  • use APIs that extracts job positions (example), and wrap this API as pipeline component

If you fear of technical scalability of lookup tables, you could use flashtext

If you need mappings to canonical values e.g. “React programmer” or “React engineer” to “React developer”, when you have to use synonym component as well