Hi,
I need to detect contact details of the employees from the database. For detecting the entities I am using lookup tables. The name and surname are unfeaturized:
Now, I am getting very different confidence levels on detected intents depending on the names provided, so for example:
User: What is phone number to John Smith
intent: ask_phone, confidence 0,8
User: What is phone number to Janet Jackson
intent: ask_phone, confidence 0,26
My guess is that the name and surname get features that are used for intent detection. If so, how to avoid it?
I am attaching the config file
config.yml (705 Bytes)
I have actually added a custom nlu component that anonimizes the names in the message after the entities are detected but before the featurizer starts, but it does not work. Is this because the features are set in tokens in the tokenizer, and they are just extracted from the tokens in the featurizer?
Would that be then a good approach to remove the tokens Related to the entities and add new, anonimized tokens? Or Maybe just change the vector value in the tokens?
Can you point me to some sources that could help me doing that?
Or maybe there is a completely different solution to my problem?