Sorry, I’m not sure I’ve understood your question, are you asking why I’m suggesting you remove the duplicates?
For the intents, the examples for myname
and name
are very similar. This makes it difficult for the classifier to assign an intent. You can keep them if you change the examples to be more distinct for each intent.
For the entities, it is a similar issue, there’s not enough distinction between PERSON
and name
.
I think that SpacyEntityExtractor
may be having trouble extracting some of the names as it hasn’t seen them in the training data. You can try using a larger model, but I think you might be best off using a lookup table in addition. If you know that most of your users will have Ethiopian names then you could pull the most common ones and add them to a lookup table (for the entity PERSON
). You will need to use the RegexEntityExtractor
in your pipeline.
See this post for more information.