Problem with regex recognization Rasa NLU

Hi guys, i am trying to get a entity through regex pattern,which constists of only alphabets [a-z] (i have also added few optional characters).I have also added some examples of it please look into it and update me whats is wrong with the data thanks.

regex: sBook

  • [^a-zA-Z0-9!#$()-` + / \ "]*

##intent: searchBook

i have upto a hundred data for training the enitity sBook. but after training the nlu was not able to recognize.

search for the book machine made by me

it was not able to recognize the entity. Thanks in advance.

First of all, that regex seems like it’ll match characters which is NOT presented in the set.

Secondly, you don’t need regex feature for this kind of entity, since it almost always composed from normal characters and is positioned like a noun in a sentence. Using your training data only should be enough. So my advice is to remove the regex feature.

thanks for your reply @fuih i tried without regex as well i faced the same issue.

That’s certainly weird. Can you upload your nlu.md ?

please look into it

##intent: SearchIntent

Thanks

Question { “text”:“search for the book alice by james” }

Response: { “intent”: { “name”: “SearchIntent”, “confidence”: 0.9857004881 }, “entities”: [ { “start”: 29, “end”: 34, “value”: “james”, “entity”: “Person”, “confidence”: 0.9153108498, “extractor”: “CRFEntityExtractor” } ], “intent_ranking”: [ { “name”: “SearchIntent”, “confidence”: 0.9857004881 }, { “name”: “MagicCodeIntent”, “confidence”: 0.0256307088 }, { “name”: “goodbye”, “confidence”: 0.001080066 }, { “name”: “PickUpIntent”, “confidence”: 0 }, { “name”: “listCheckedOutIntent”, “confidence”: 0 }, { “name”: “mood_unhappy”, “confidence”: 0 }, { “name”: “AMAZON.CancelIntent”, “confidence”: 0 }, { “name”: “HoursIntent”, “confidence”: 0 }, { “name”: “ReloadIntent”, “confidence”: 0 }, { “name”: “ListHoldIntent”, “confidence”: 0 } ], “text”: “search for the book alice by james” }

Can you upload nlu.md as a file please. It will make it easier for me to copy the data and test it.

nlu.md (61.5 KB)

please look into the above

After training a test model and tried your sentence:

The bot recognized ‘machine made by me’ as sBook entity.

So in my opinion, the problem is you have a lot of books whose name is extremely long. This might makes it hard to recognise the sBook and person when they are in a same sentence, since the book’s name can be very long.

Finally, i found a book named

a new and authentic system of universal geography antient and modern including all the late important discoveries made by the english

You see the word made by is actually in a book’s name. But you don’t have any training data like:

search for a book [Book’s name] made by [person]

So basically you should tried add a couple training data of the above format, and maybe cut out the training data which have really long book’s name (i’m not sure if this will help, give it a try).

P/S: I made a couple of test and except for the case above, the entities are recognise correctly, even if the book’s name is long.

thanks @fuih for ur reply, i dont know whats the problem in this data

“search for the book alice by james”

where both book name and author names are dynamic actually i want both to identified but as of now am getting only one entity person the book name was not recognized.

Yeah i tried that and had the same result as you. But if i tried ‘search for the book harry potter by james’, the bot recognised the entities correctly. So maybe it think ‘alice’ is too short to be a book name ? I have no idea :smiley:.

@fuih the i face this issue in few other places too even when the book names are long i dont know how to handle it .because i need both entities to be dynamic (book name and author name). can anyone give me some idea to overcome it. thanks

can anyone state me whats the problem here

Hi @prasanth55555. You should use NER model to extract such entities (just like you are doing it now). However, here you are dealing with ambiguity - alice and james are both names and its likely that the model will struggle to figure out where the name is actually a name and where the name is a book name. To improve the extraction for book names you could try using lookup tables (if you have a list of books that can be found in the user questions)

@Juste i tried with lookup table its working fyn thanks.