Hi guys, i am trying to get a entity through regex pattern,which constists of only alphabets [a-z] (i have also added few optional characters).I have also added some examples of it please look into it and update me whats is wrong with the data thanks.
First of all, that regex seems like it’ll match characters which is NOT presented in the set.
Secondly, you don’t need regex feature for this kind of entity, since it almost always composed from normal characters and is positioned like a noun in a sentence. Using your training data only should be enough. So my advice is to remove the regex feature.
After training a test model and tried your sentence:
The bot recognized ‘machine made by me’ as sBook entity.
So in my opinion, the problem is you have a lot of books whose name is extremely long. This might makes it hard to recognise the sBook and person when they are in a same sentence, since the book’s name can be very long.
Finally, i found a book named
a new and authentic system of universal geography antient and modern including all the late important discoveries made by the english
You see the word made by is actually in a book’s name. But you don’t have any training data like:
search for a book [Book’s name] made by [person]
So basically you should tried add a couple training data of the above format, and maybe cut out the training data which have really long book’s name (i’m not sure if this will help, give it a try).
P/S: I made a couple of test and except for the case above, the entities are recognise correctly, even if the book’s name is long.
Yeah i tried that and had the same result as you. But if i tried ‘search for the book harry potter by james’, the bot recognised the entities correctly. So maybe it think ‘alice’ is too short to be a book name ? I have no idea .
@fuih the i face this issue in few other places too even when the book names are long i dont know how to handle it .because i need both entities to be dynamic (book name and author name). can anyone give me some idea to overcome it.
Hi @prasanth55555. You should use NER model to extract such entities (just like you are doing it now). However, here you are dealing with ambiguity - alice and james are both names and its likely that the model will struggle to figure out where the name is actually a name and where the name is a book name. To improve the extraction for book names you could try using lookup tables (if you have a list of books that can be found in the user questions)