Need help with identifying multiple entities within a single token

Ex. " Attendance for R15CS019"

I’ve been trying to mark “R15cs019” as entity SRN and only “CS” as entity Branch.

Using JSON format, with rasa_nlu trainer.

{ “text”: “What is my attendance r15cs019 ?”, “intent”: “getAttendance”, “entities”: [ { “start”: 22, “end”: 30, “value”: “r15cs019”, “entity”: “SRN” }, { “start”: 25, “end”: 27, “value”: “CSE”, “entity”: “Branch” } ] },

Using spacy_sklearn pipeline, Only SRN entity is recognized. Is there any way to make it recognize both entities simultaneously.

Is this a standard format? If you can recognize the whole thing using a regex featurizer for example, I think it would be best if you would keep using regexes to extract the middle part that you want as something else.

As a general note, it is my personal opinion that you should only let the entity extractor do the hard work of extracting the substring you want out of the sentence if that substring follows a certain logic, and then use some other thing to do logic on that substring (such as extracting the “CS” for example). Entities should be only used to extract stuff out of user messages. If you need the bot to use those values to drive its logic, use slots, custom actions, the Keras policy, etc.

Thanks for the suggestion. I saved it in a slot and indexed it. Thank you.

1 Like