Tokenizer_jieba dictionary_path does not work

weilingfeng1996 · December 18, 2018, 11:10am

for example ABC sometimes AB , C ,sometimes BC i add them to dictionary_path . and when i train-nlu "entities must span whole tokens Wrong entity start " and “Wrong entity end” .And many training data reported this error，not just this situation.

howlanderson · January 8, 2019, 3:15am

Please submit an issue on GitHub, that’s the better place for such things.

akelad · January 8, 2019, 10:33am

@howlanderson thanks for sending them to github @weilingfeng1996 yeah please open a github issue, if there’s an actual bug with our code it’s always best to put it there

quantum00549 · April 7, 2022, 7:55am

你需要指定的是目录地址, 而非词典文件本身的地址(所以目录里最好也别有能被jieba识别的干扰文件). 可能作者考虑的是你有多个分门别类的词典, 但我觉得这里应该能支持多种形式才对.

Topic		Replies	Views
After augmenting the dictionary of Jieba, do I need to re-train the NLU model? Rasa Open Source	0	354	January 25, 2021
How to add user_dict in SpacyTokenizer Rasa Open Source	2	473	May 17, 2021
Load jieba self_dictionay cost too much time Rasa Open Source	3	873	September 23, 2019
Dealing with Non-ascii characters Feedback on Rasa Open Source	24	1723	November 18, 2021
Getting error in config_spacy.json file Rasa Open Source	5	1249	October 16, 2019

Tokenizer_jieba dictionary_path does not work

Related topics