Similar Entity Extraction

I am facing problem in extracting the entity for following use case. User:Hi Bot:hello! How may I help you User:I want my tax reciept Bot: Sure! Could you provide me your ID User:59(he can also say like:it is 59 or ID is 59 or ID=59 or my ID is 59) Bot: Your User number? User: it is 9154 Bot:Your transaction number user: 745

I want to extract 59,9154,745. How to proceed with this problem? I have trained the data but when I provide ID and user number of the same length. the entity is not extracted. However intent is working fine and the the conversation is happening as per the story.

have you tried using the ner_duckling_http entity extractor?

I am using ner_crf container

for numbers, please try the ner_duckling_https extractor

Actually installing the Duckling is a tedious task. I m getting too many errors while installing it.

you don’t need to install it. you can run it with docker https://rasa.com/docs/nlu/master/components/#ner-duckling-http

Hi there, I have it the duckling server running in docker and it seems to be working however it seems to have slowed down my training to snail pace. I am also unsure how to enter training data as duckling creates it’s own entities, do I just enter the intent and add the duckling entity names to my rasa_core config?

hmmm, duckling shouldn’t affect your training at all. and duckling just extracts the entities, no need to label them

What if the inputs are alphanumeric for e.g. my ID is NP_45680780. And how to correct order in case user replies in a different order. E.g. BOT-Can I have Ur ID? USER- my user number is NA_56098

And one more doubt,how to extract desired entity from multiple entities As : my transaction number is QR%56873578 for user number 987_AT. I have to extract QR%56873578

1 Like

I’m wondering the same thing here. It doesn’t seem like Duckling is the solution for custom entities such as alphanumeric codes like you and I are working with.

Does anyone have any suggestions for extracting custom alphanumeric codes? Duckling and entity lookup tables haven’t seemed to work after some testing. Maybe creating some Regex patterns would work?

Thanks!

You have in this page at the bottom a description about Regex patterns.

Thank you. I’m familiar with the Regex documentation, but zip codes and phone numbers seem to be a bit different thank custom alphanumeric codes since zip codes and phone numbers always have the same format while unique alphanumeric codes do not.

I was more so looking for some thoughts on the effectiveness (and if it is possible) to use Regex patterns for a situation such as this.

you can also have a regex pattern for alphanumeric codes as well

I take the example of userID

let’s say it is 6 digits and starts with G

G([1-9]\d{4})

Then I should provide examples such as

my id is G15367 [ID] …

In another way, you can also add regex entity extractor, that takes a regular expression pattern as rule and find entities from a given token (similar to duckling)

also FYI, in duckling you can add custom rules if you have a hang on Haskell. They have recently added a new feature to add custom dimension.

Awesome! Thank you. I can confirm for @ashukrishna100 that this does indeed work. I used a few guides online that provided regex variable charts to come up with the regex pattern suited for our use case. Performance is great after implementing a regex pattern!

2 Likes

Yeah it works and with optimum performance for sure. Thank you so much @ccelotto

Very informative. no doubts you are a RASA star. have tried regex entity extractor and it works fine. Thanks @souvikg10

1 Like

Hi ,

Could you please share how it works. I am also using it but no luck.

NLU: “regex_features”: [ { “name”: “Transaction_ID”, “pattern”: “[1]+$” }]

pipeline:

  • name: “tokenizer_whitespace”
  • name: “intent_entity_featurizer_regex”
  • name: “ner_crf”
  • name: “ner_synonyms”
  • name: “intent_featurizer_count_vectors”
  • name: intent_classifier_tensorflow_embedding

Thanks


  1. a-zA-Z0-9 ↩︎

This was super helpful for me in creating the regex for my use case http://www.cbs.dtu.dk/courses/27610/regular-expressions-cheat-sheet-v2.pdf

From looking at your regex pattern, it seems like it may be missing some parentheses/brackets.

This is mine: (([A-z]{1})([0-9]{6,7}))

It is used for picking up on alphanumeric codes similar to “A123456”, “G493024”, “F4930294”.

So breaking it down… (([A-z]{1})([0-9]{6,7}))… this bolded section states that the first character {1} will be a letter A-Z [A-z].

(([A-z]{1})([0-9]{6,7}))… this bolded section states that the next 6-7 characters {6,7} will be numbers 0 through 9 [0-9].

Hopefully you can use this as a reference! If not, tell me what you’re trying to accomplish, and I’ll do my best to help you craft it.

P.S. don’t forget (like mentioned in Rasa docs for regex) to provide some training examples using the regex or the model won’t know to pick up on the pattern.

1 Like

Hi Christopher,

Thank you very much for detailed explanation. It worked i for me . I think it might be due to i forgot including intent_entity_featurizer_regex in config file