Extact entites following a pattern

Hello everybody,

I’m working on my Rasa bot since approximately two weeks, and there is a problem I still haven’t figured out. How to extract entities from an intent without providing a huge lookup table ?

My goal is to extract entities without really knowing them precisely before the user input based on a sentence pattern.

Here is an example of a potential user input :

I’m looking for an article about basketball

I want to recognize the intent “search” and the entity basketball.

To that end, I created training data following this form :

        "text": "I'm looking for an article about basketball",
        "intent": "search",
        "entities": [
            "start": 33,
            "end": 43,
            "value": "basketball",
            "entity": "search_term"

And my stories look like that (here’s a sample) :

* search{"search_term": "basketball"}
    - slot{"search_term": "basketball"}
    - utter_give_article

Thus, I don’t know how to train my bot to recognize the intents following the same pattern ? If the user input is

I’m looking for an article about rock’n’roll

I would like to retrieve “rock’n’roll” as a “search_term” intent. Currently, the intent is correctly guessed but the entity is rarely extracted if it isn’t textually in the training data set. I don’t want to have an infinite number of training data and of stories containing each existing potential search word.

As it is complicated to predict all the potential entites from the user, I can’t create a lookup table for the “search_term” entity (or else, it would be the entire dictionnary…)

Thanks a lot for your help


1 Like

use NER - CRF to train a pattern that generalises over a sentence to find pattern where an entity is most likely present. useful for broad entity recognition like street names

for e.g

you can generalise a NER to detect a most likely street name if the word before starts with Avenue or Rue (in French)

but you need atleast 20-30 examples of different search_terms to generalise your entity recognition


Thank you for your answer @souvikg10.

I understand that a NER can detect paterns, but obviously only if this pattern begins with a word or contains a precise work (In your example “Avenue”).

In my use case, the entity to extact may take different forms.

  • I’m looking for an article about Saturn (only word)
  • I’m looking for an article about Freddie Mercury (two words)
  • I’m looking for an article about the Brexit (one word, but preceded by “the”)
  • I’m looking for an article about 2022 Winter Olympics in Beijing (five words)

Increasing the number of examples in my training data is the only solution ? Or is there an other technique to achieve this extraction ?

Best regards

1 Like

That is not neccessarily true though, what i described is provide you an idea about how the CRF detects positions of a word in a sentence which most likely occurs given a particular word vector around it.Most likely co-relation between two words.

I already see a pattern in your sentence , your most likely entity comes after the word ‘about’. articles are treated as stop words https://eli5.readthedocs.io/en/latest/tutorials/sklearn_crfsuite.html#feature-extraction

Take a look at the features you can pass for ner_crf in Rasa https://www.rasa.com/docs/nlu/master/components/#ner-crf