Is the dependency parse from spacy used somehow currently in Rasa to get better do entity extraction?
At the moment no, we don’t implement it. How do you know it would increase the accuracy of entity extraction?
I don’t obviously. What I meant is that in general the idea of interpretation in a conversation setting is to perform some sort of semantic parsing of the sentence, to extract all the relations and their related entities into a machine friendly logical form, like lambda calculus or something of that sort.
Rasa’s interpreter is short of that and simplifies the problem into predicting intent and extracting entities. However even in the data camp course lectured by Alan Nichol, there is a discussion and an exercise on using dependency parse from spacy to inform better about how to fill the correct slots with the entities. A good example of that I guess would be “I want to fly from NY to SF”. In order to get right the from and to entities in rasa right now you need to train a custom NER for “from_city” and “to_city”, or to apply some heuristics on top of regular entity extraction, in order to correctly extract the entities and put them in the right slots. Instead, you could imagine extracting the dependency between “NY” and “SF” from the dependency parse of the sentence and using that to fill the slots.
So basically I am wondering if there are plans to move towards real parsing of sentences in addition to just intent classification+entity extraction strategy to be able to better capture the information.
I agree with you in this case - could make for a decent component - however we have no plans to build this kind of parser at the moment because we could also handle this conversation by asking the information in two stages (“Where are you flying from?”, “And where to?”)
Entity extraction is always a big focus for us though - we recently reviewed the idea of including composite entities (which would also help in the example you gave) and the superstar team at CarLabs built a custom component for this called Innatis. There are some good libraries on Github for dependency parsing so I would definitely recommend taking a stab at building a custom component if you’d like to try it out with our code, @Juste just wrote a great article on how to do that here.
What do you mean with ‘including composite entities’?
I think a dependency parser may be to much for a bot, you’re probably not trying to extract meaning from syntax but from relations between entities. I just posted a question on relation extraction: Relation Extraction (between entities)
Maybe you have an answer
You could use this custom component of mine that I use for composite entities. It doesn’t use heuristics, you rather define your composites via placeholders. For your example “I want to fly from NY to SF”, you could train your base entity “city” with values “NY” and “SF” and then define a composite entity “route” with value “@city to @city”. The component would then group these two city entities for you.
Hey, your component is really great! I’m not beeing able to make the patterns work like regex patterns. I’m trying something like (just an axample of how I’m writing the patterns): @color? @fabric (@pattern @color)+ But it’s not working. Could you explain me how to do it?
Your regex is not doing what you think it’s doing! You can test your regex definition online for example at https://regex101.com.
? quantifier only acts on the character directly before it. You’d have to enclose
@color in a capturing group to make the whole entity optional.
Also, you have to think about whitespace. If you write
@color? @fabric and
@color is missing, the regex still expects a space before
If you take a look at the example regex in the component’s readme page, you’ll see groups like this:
If you are not familiar with regexes, you might want to use this as a template for optional entities. It is an optional non-capturing group by using
(?:...)?, it checks for the string
@pattern and it accepts any amount of whitespace by using
\s+ (note the double backslash, this is required to properly escape the backslash in the rasa training file). When using this, you don’t want to include explicit whitespaces between entities.
Also, make sure your regex matches the actual content of the message, not only the entities. For your pattern, I could imagine a user uttering
I am looking for red silk with blue stripes. In that case, the “with” should be part of your pattern (you can make it optional) and the
@color entity should come before the
Great, thank you! I was thinking it all wrong.