But, when I enter numeric entries like 12345, 123456, 12345678 which are less than 10 digits, it is still being recognized as the entity - NPI
How to resolve this?
hi @Akhil - can you please try removing the LexicalSyntacticFeaturizer from your pipeline? this adds a feature to your feature vector which says ‘is this token a number’, which is probably the cause. You might also try adding some NLU examples where you have numbers which are not nine digits long and which are not highlighted as entites. Then your model has a chance to understand that the length is important
I tried your suggestion but didn’t help.
I have another entity called Claim ID which is an alpha-numeric string of length 1-50. I have around 150 alphanumeric strings generated by using a random string generator and also a regex like this.
The input for claim_id is getting split and being recognized as 3 entities . How to solve this issue?
Should I stop using conveRT and shift to WhitespaceTokenizer or SpacyNLP?
I used Spacy instead of conveRT and the claim IDs and npis are being extracted perfectly without getting split. but there is intent misclassification with Spacy(probably because of low training data.)
The conveRT is performing very well in Intent classification but splitting entities in entity extraction. Please suggest a way to resolve this using conveRT without the entities getting split.
hi @Akhil - you may have seen we are working towards the 2.0 release & there are already a few alpha releases out there. 2.0 includes a RegexEntityExtractor component which should do exactly what you want.
Hi @amn41. I added it as a custom component and it works as expected. But, I see that many entities of the same name are extracted. Is there a way I can specify the name of the entity extractor in slot mappings?
I can turn off entity extraction for diet but I need it for person_name entity extraction.
hi @Akhil - you should be able to remove the entity annotations for the claim_id entity from your training data, since the regex extractror doesn’t use them anyway.
Hi @amn41.
I removed all examples of claim_id and npi as you suggested.
I am getting the following warning message and no entities are being extracted in rasa shell nlu.
2020-09-09 15:16:01 INFO rasa.nlu.model - Starting to train component RegexEntityExtractor
/home/akhil/office/chatbot/dev/venv/lib/python3.6/site-packages/rasa/utils/common.py:363: UserWarning: No lookup tables or regexes defined in the training data that have a name equal to any entity in the training data. In order for this component to work you need to define valid lookup tables or regexes in the training data.
@Akhil - ah, my mistake. In fact you still have to provide at least one example of each of these entities. This is because we don’t have access to the domain inside the NLU component, and so we have to check the training data to see what entities exist.
Hi @amn41. Oh ok. Thank you for the pointer. I added the following 2 lines and the issue seems to be resolved. Now, I need not add even 1 example in training data. Defining the regex is enough.
Please correct me if I have done anything wrong here. Thank you very much for your time and help.
def extract_patterns(
training_data: TrainingData,
use_lookup_tables: bool = True,
use_regexes: bool = True,
use_only_entities: bool = False,
) -> List[Dict[Text, Text]]:
"""Extract a list of patterns from the training data.
The patterns are constructed using the regex features and lookup tables defined
in the training data.
Args:
training_data: The training data.
use_only_entities: If True only lookup tables and regex features with a name
equal to a entity are considered.
use_regexes: Boolean indicating whether to use regex features or not.
use_lookup_tables: Boolean indicating whether to use lookup tables or not.
Returns:
The list of regex patterns.
"""
#Adding names of entities which have a defined regex but 0 examples in training data.
for regex in training_data.regex_features:
training_data.entities.add(regex['name'])
if not training_data.lookup_tables and not training_data.regex_features:
return []
patterns = []
if use_regexes:
patterns.extend(_collect_regex_features(training_data, use_only_entities))
if use_lookup_tables:
patterns.extend(
_convert_lookup_tables_to_regex(training_data, use_only_entities)
)
return patterns