Does punctuation affect the model?

tatianaf · August 17, 2020, 4:07pm

For example, does adding/removing question marks in training examples for intents affect the prediction model? Similarly, does a user’s utterance containing different punctuation like a question mark or other affect the prediction model? I assumed not at first but Ive seen some random instances where punctuation does seem to have an affect. If so, we will probably want to add something to our pipeline that strips punctuation out to avoid random bias based on punctuation…

rctatman · August 18, 2020, 10:15pm

Hi Tatiana! It depends on what pipeline you’re using. If you’re using a tokenizer that treats punctuation as tokens, like the SpaCy tokenizer, yes.

If you’re curious about the effect of punctuation on your model, I’d suggest an A/B test before removing it entirely. There is some information present in punctuation and removing that may end up changing the performance of your models. (Consider the difference between “There’s a bear!” and “There’s a bear?”, for example. You’d probably want those two utterances classified into different intents.)

Hope that helps!

madtrevor · May 17, 2021, 10:33am

I think the addition and omission of punctuation really affect this. By the way, recently I decided to start learning English grammar again. Unfortunately, I have not studied English grammar for a long time and have forgotten a lot of things. Now I have started studying point of view worksheet 1, which helps me remember how I can grammatically correctly express my emotions and feelings in the text. Similar worksheets really help to remember many grammatical aspects that you could forgot.

Asmacats · February 11, 2022, 3:14pm

hello,

I have got the same problem. It is true that SpaCy tokenizer considers ponctuations as tockens but as i have remarked form the pkl file of the countVectorFeaturizer that ponctuations are not cnsidered.

Any explication please ?

Topic		Replies	Views
How can I consider the punctuation at the intent classifier? Rasa Open Source	0	339	March 16, 2020
Question about rasa ponctuation Rasa Open Source	0	171	February 10, 2022
Tokenizer_spacy uses punctuation as tokens? Rasa Open Source	1	537	January 18, 2019
Why does incorporating Spacy decrease model performance Rasa Open Source	1	144	December 20, 2023
Punctuation removal Rasa Open Source	1	462	June 19, 2023

Does punctuation affect the model?

Related topics