If I want to recognize Tom as a person entity, is it this:
- It is [Tom](person)'s
If I want to recognize Tom as a person entity, is it this:
- It is [Tom](person)'s
That should work. Both tokenizers (WhitespaceTokenizer
SpacyTokenizer
) will split the text into
"It" "is" "Tom" "s"
.
Thx @Tanja.
Hi @Tanja, can you confirm the WhitespaceTokenizer would break on the single quote?
Here it says it won’t (the big “Warning” in the middle of the page):
https://rasa.com/docs/rasa/user-guide/evaluating-models/#comparing-nlu-pipelines
Good catch. The warning is incorrect. I can confirm that the WhitespaceTokenizer
splits Brian's
into multiple tokens.
I created an issue to update the docs: Incorrect warning about WhitespaceTokenizer · Issue #4605 · RasaHQ/rasa · GitHub
Thanks again.