Extact entites following a pattern

tbx · October 15, 2018, 9:22am

Hello everybody,

I’m working on my Rasa bot since approximately two weeks, and there is a problem I still haven’t figured out. How to extract entities from an intent without providing a huge lookup table ?

My goal is to extract entities without really knowing them precisely before the user input based on a sentence pattern.

Here is an example of a potential user input :

I’m looking for an article about basketball

I want to recognize the intent “search” and the entity basketball.

To that end, I created training data following this form :

{
        "text": "I'm looking for an article about basketball",
        "intent": "search",
        "entities": [
          {
            "start": 33,
            "end": 43,
            "value": "basketball",
            "entity": "search_term"
          }
        ]
      }

And my stories look like that (here’s a sample) :

* search{"search_term": "basketball"}
    - slot{"search_term": "basketball"}
    - utter_give_article

Thus, I don’t know how to train my bot to recognize the intents following the same pattern ? If the user input is

I’m looking for an article about rock’n’roll

I would like to retrieve “rock’n’roll” as a “search_term” intent. Currently, the intent is correctly guessed but the entity is rarely extracted if it isn’t textually in the training data set. I don’t want to have an infinite number of training data and of stories containing each existing potential search word.

As it is complicated to predict all the potential entites from the user, I can’t create a lookup table for the “search_term” entity (or else, it would be the entire dictionnary…)

Thanks a lot for your help

TBX

souvikg10 · October 15, 2018, 11:25am

use NER - CRF to train a pattern that generalises over a sentence to find pattern where an entity is most likely present. useful for broad entity recognition like street names

for e.g

you can generalise a NER to detect a most likely street name if the word before starts with Avenue or Rue (in French)

but you need atleast 20-30 examples of different search_terms to generalise your entity recognition

tbx · October 22, 2018, 10:14am

Thank you for your answer @souvikg10.

I understand that a NER can detect paterns, but obviously only if this pattern begins with a word or contains a precise work (In your example “Avenue”).

In my use case, the entity to extact may take different forms.

I’m looking for an article about Saturn (only word)

I’m looking for an article about Freddie Mercury (two words)

I’m looking for an article about the Brexit (one word, but preceded by “the”)

I’m looking for an article about 2022 Winter Olympics in Beijing (five words)

…

Increasing the number of examples in my training data is the only solution ? Or is there an other technique to achieve this extraction ?

Best regards

souvikg10 · October 22, 2018, 11:02am

That is not neccessarily true though, what i described is provide you an idea about how the CRF detects positions of a word in a sentence which most likely occurs given a particular word vector around it.Most likely co-relation between two words.

I already see a pattern in your sentence , your most likely entity comes after the word ‘about’. articles are treated as stop words https://eli5.readthedocs.io/en/latest/tutorials/sklearn_crfsuite.html#feature-extraction

Take a look at the features you can pass for ner_crf in Rasa https://www.rasa.com/docs/nlu/master/components/#ner-crf

Topic		Replies	Views
Separate training data for crf_entity_extractor Rasa Open Source	1	485	November 28, 2019
Share entity training samples across intents Rasa Open Source	2	740	October 10, 2019
Entity not extracted, if particular value not used in the training data Rasa Open Source	3	903	November 13, 2018
How can I extract any word from a sentence where it expects an entity Rasa Open Source	2	1418	August 9, 2020
Intent Matching to be affected by Entity Extracted Rasa Open Source	14	1229	June 8, 2020

Extact entites following a pattern

Related topics