which one is better for normalization,spacy tokenizer or whitespace tokenizer???or can we use them both in pipline
You only want to use one tokenizer per pipeline. The whitespace tokenizer creates a new token every time it runs into white space. The SpaCy tokenizer adds some additional, language-specific rule checking after the white space splitting (https://spacy.io/usage/linguistic-features#tokenization). Which tokenizer you use will depend on the rest of your pipeline: different pipeline components rely on different tokenizers.
@rctatman assume am creating a training data from scratch in english, then which one would you suggest??
If you’re planning on using SpaCy at all (which I would, it’s a great library), use their tokenizer.
so you are saying using spacy tokenizer is better than whitespace tokenizer??..well oky thanks i will try it out