How to correctly extract title and name at same time

naamtokyam · January 2, 2022, 9:34pm

Given a sentence such as "am i borrowing [the great gatsby]{book_title} by [F. Scott Fitzgerald]{PERSON}?", I want to extract two entities book_title and PERSON (book_author) from it. So far I tried to use DIETClassifier for extracting title of the book (which also used for extracting other entities) and SpacyEntityExtractor with dimension of PERSON for author’s name.

However I have two problems (and thus the questions) using combined entity extractors:

DIETClassifier incorrectly extracts book_title. It mistakenly recognize as multiple titles, or author’s name as the book title or failed to recognize. For example, when I type "am i borrowing the devil's notebook by anton szandor lavey?", then the rasa sets entities as

entities '[{'entity': 'book_title', 'start': 15, 'end': 26, 'confidence_entity': 0.7182283997535706, 'value': "the devil's", 'extractor': 'DIETClassifier'}, {'entity': 'book_title', 'start': 45, 'end': 58, 'confidence_entity': 0.9045381546020508, 'value': 'szandor lavey', 'extractor': 'DIETClassifier'}, {'entity': 'PERSON', 'value': 'anton szandor lavey', 'start': 39, 'confidence': None, 'end': 58, 'extractor': 'SpacyEntityExtractor'}].

What should be the better way to extract the book_title in my case? Would using the lookup table be a good idea for both book title and author’s name? How large can the lookup table be?

SpacyEntityExtractor does well extracting person’s name. e.g.: "am i borrowing the book called Harry Potter written by JK Rowling?" gave me a result of

'[{'entity': 'book_title', 'start': 15, 'end': 18, 'confidence_entity': 0.7037492990493774, 'value': 'the', 'extractor': 'DIETClassifier'}, {'entity': 'book_title', 'start': 31, 'end': 43, 'confidence_entity': 0.9985925555229187, 'value': 'Harry Potter', 'extractor': 'DIETClassifier'}, {'entity': 'PERSON', 'value': 'Harry Potter', 'start': 31, 'confidence': None, 'end': 43, 'extractor': 'SpacyEntityExtractor'}, {'entity': 'PERSON', 'value': 'JK Rowling', 'start': 55, 'confidence': None, 'end': 65, 'extractor': 'SpacyEntityExtractor'}]'.

However, Book title can also be tricky where it may include person’s name (e.g.: Harry Potter) which would be extracted as well by Spacy as PERSON. How can I write a logic to distinguish which is what we really want to keep as an author name or not?

Thank you for the help!

j.mosig · January 3, 2022, 11:32am

Hello @naamtokyam

In principle, DIET should be able to extract both book title and author, but it’ll need lots of training examples (I guess 1000+?). Especially book titles are difficult, because they can be quite long and variable. If this doesn’t work for you, could you make your bot ask for one thing at a time?

naamtokyam · January 5, 2022, 8:10pm

Hi @j.mosig Thank you! I will try with more example. So would you recommend to use DIET for the name extraction?

j.mosig · January 7, 2022, 11:21am

Actually, the user could also just enter a title without preamble or explaining what it is that they are typing. If the title is just a person’s name, how could DIET know if this is a title or an author?

You could use the RegexEntityExtractor in addition to DIET and train it with lookup tables that only include exact author names. But that’s tedious because you have to update the rasa model each time your database changes.

You could also have only one entity keyword with roles author, title, isbn, etc. and assign the role if it is clear or no role if it is not clear. Use this to train DIET and write a custom action that checks the database if it can find some matching entries. If you do this, note that we might tweak roles/groups in the future so that they are assigned by the model without looking at the entity value, so it’d only look at surrounding text to decide what the role is. This is useful in most cases, but you’d have to design your training data such that roles are only assigned whenever it can actually be inferred from surrounding words alone.

Topic		Replies	Views
Multiple entity extractors does not work with Test stories Rasa Open Source	0	397	March 19, 2021
Can we extract two different name within one user utternace? Rasa Open Source	11	1129	November 16, 2020
Unable to use lookup functionality Rasa Open Source	4	379	April 15, 2021
What is the best method for entity extraction for names? Rasa Open Source	18	7586	December 26, 2022
Issue on Multiple Entity Extractions with Spacy extractor and diet classifier Rasa Open Source	6	1031	January 17, 2022

How to correctly extract title and name at same time

Related topics