Extract Long Multi-word Entities

I am working on building a chatbot that will allow for long multi-word entities to be extracted, like so: Search for [this is a search query] Look online for [another query] As you can see, I want to extract multi-word entities that don’t have similarities in text other than the context in which they appear. Looking at Issue [#797 on GitHub] (How to train ner_crf to extract 'title', 'description' like entities from a question ? · Issue #797 · RasaHQ/rasa · GitHub), I saw that it was recommended to use dependency parsing rather than the entity approach. I am unsure how to implement this in Rasa. Thank you!

The downside of queries is that they can take many forms, which will make it hard for an entity detector to find the ‘query’ part of the sentence. Is there a reason why a form wouldn’t work? In terms of accuracy that might be more practical.

@koaning I want the user to be able to type in their search query without having to be asked. For example, be able to type in

Text Sarah Hello there. How are you today?

and extract the entity. If the user typed that in with forms, they would have to retype in what they already typed in.

Hi @koaning, just wanted to follow up.

@koaning, it’s not just for queries. As you can see above I may want to extract the contents of a user’s message they want to send.

Just to check that I understand. You’ve got an NLU.md file with contents such as;

# intent: send_text
- text [Sarah](person) [Hello there](message)
- send [Tim](person) a text saying "[Whatsup?!](message)"

And you’re wondering what the best method is of extracting both the name as well as the message?

I don’t fully understand how dependency parsing would help you a lot here to be honest. You can certainly use dependency parsing in spaCy as a building block to fetch entities (and I wrote a blogpost on how to pass those entities to Rasa here) and another benefit of spaCy is that they also have a pretrained model to detect names of people …

… but I’d argue that something like a general message or a query as an entity is hard to properly detect using machine learning. A form might sound like an un-ideal user interface but I’d argue: so is an entity detector that is in-accurate. If you’re 100% aiming on doing this with ML I’d not underestimate the amount of work that you need to do as well as a lot of data collecting.

Come to think of it. Another idea could be to reach your users to abide by a regex pattern. Say that users must place their text between ".

# intent: send_text
- text [Sarah](person) ["Hello there"](message)
- send [Tim](person) a text saying ["Whatsup?!"](message)

This can be extracted using a regex pattern. You could have an interface in line of;

User> Text Dave Hello
Bot> I understand `Dave` but you need to use "" to tell me what to send. 
User> Text Dave "Hello" 
Bot> Sending `Dave` a text message saying `Hello` now ...  

You really have two options:

  • write a bunch of regexes for text [X] [Y] , send [X] a message saying [Y], search for [X]
  • create an entity recognition model with Rasa

neither of these will catch 100% of cases. If you assume the user will stick to a few common phrasings, use regexes. If you want more flexibility, you’ll need to train a model, but you’ll need a fair bit of data to get this to work, and it’ll never be 100% accurate