I am using lookup table for extracting an entity “country” which has some 196 values.
Lookup table looks something like this.
Sierra Leone
Puerto Rico
Belgium
Palau
Belize
Indonesia
Brunei
Macao
Hong Kong
Nicaragua
South Africa
Montserrat
Syria
Australia
Jordan
Guinea
Libya
Paraguay
St. Lucia
Israel
Nigeria
Barbados
Kazakstan
Aland Islands
Ideally i should get hong kong as one single value.Can somebody helps or atleast tell why it happens in case of lookup table, as crf works fine with multiple words entities present in training data and if these values are in lookup table we get such results.
I just tried to reproduce the issue. For me everything seems to be working. Just to clarify, if you add the lookup table to your NLU data, Rasa predicts
ya but “hong kong” are not two different entities, this should come as one single value.
I am assuming hong kong is not present in training data.Because if [hong kong] is there in training data it comes as single entity value, and entities which are there in lookup table having multiple words gives me this issue.
It should not be related to the lookup tables. It seems to be related to the BILOU_flag. Can you try to train your bot with the following pipeline and check if it works? Thanks.
Ya, i also did the same thing, basically the training data should cover few values of each type of variation present in the lookup table, that way it learns better @Tanja
Hmm. Similar problem here. I did add several examples but not quite there.
Here’s what I have in nlu.md (listing only the relevant examples)
- list all [experts](expert_name) in the country of [South Africa]{"entity":"country", "value":"South Africa"}
- list all [experts](expert_name) in the country of [New Zealand]{"entity":"country", "value":"New Zealand"}
- [United Kingdom](country)
- [United States](country)
It still didn’t catch “El Salvador”; ended up extracting “Salvador” for country name and dropped “El”. Or “Marshall Islands”. Picked up only Islands. Maybe it recognizes Marshall as a person’s name and dropped it?
Do I need to add more samples?
And, I also have countries with “&”. Looks like I will need to a few samples of those as well for it to work?
Hi can I get some guidance please? I’m using the CRFEntityExtractor with BILOU_flag set as True and my model seems to be splitting the entities. I was under the impression that if I needed my compound entities to be treated as a single entity, I need to use BILOU tagging which seemed to be working fine for me till now. I recently switched from SklearnIntentClassifier to DIETClassifier (configured for only intent classification).
I know one solution is to add training phrases and I’ve done that but since I can’t possibly add all entities in training, I need a better solution. A solution at the config level that is.
@iszainab Can you please share some more details? What Rasa version are you using? How does your pipeline look like? And can you please give an example of an entity that is split? Thanks.
Just a thought regarding your case. Can your issue have something todo that you have two different entity extractors in your pipeline? “CRFEntityExtrator” and “DIETClassifier”?
I had an issue with that myself some time ago. (I am a newbie with Rasa)