- How to specify user dictionary of spacy for non-English language, you know the performance of tokenizer will affect the whole nlp process, such as entity extraction.
for example, we specify it for Chinese language
nlp.tokenizer.pkuseg_update_user_dict(['yyds','cx-4'])
- How to specify custom entities in spacy for rasa.
import spacy
nlp = spacy.load('zh_core_web_sm')
nlp.tokenizer.pkuseg_update_user_dict(['yyds', 'cx-4'])
ruler = nlp.add_pipe("entity_ruler")
patterns = [
{"label": "net_hot_word", "pattern": "yyds"},
{"label": "car_name", "pattern": "cx-4"}
]
ruler.add_patterns(patterns)