How to handle questions in languages like Greek?

petasis · October 14, 2020, 9:03am

Hi all,

I am having an issue in distinguishing questions from non-questions. For example, in english you can say “Is it good?” and “It is good.”. The same statements in Greek are “Είναι καλό;” & “Είναι καλό.”. They have the same word order, the only difference is the question mark (";" in greek).

So, I am wondering what is happening in the tokenizer. I am using the LanguageModelTokenizer, is there a way (i.e. in rasa shell) to see how the tokenizer tokenises the sentence, and if the questionmark is kept?

petasis · October 14, 2020, 9:56am

I did the “regular” stuff, of adding print to files in rasa, and I saw the following:

The tokenizer of the language model does not see the initial message. It is first passed from WhitespaceTokenizer, which removes all punctuation. So, my “;” does not survive, and rest of the classifiers cannot distinguish questions from their non-question equivalents.

Topic		Replies	Views
Help with Rasa for Hebrew Rasa Open Source	12	1117	January 19, 2021
Remove punctuation like ?,!, in message Rasa Open Source	2	914	September 20, 2019
How can I consider the punctuation at the intent classifier? Rasa Open Source	0	324	March 16, 2020
Spaces between question mark not recognized Rasa X Rasa Open Source	1	783	August 2, 2019
Rasa Chatbot Arabic Language Support Rasa Open Source	10	5187	January 14, 2022

How to handle questions in languages like Greek?

Related topics