How to handle questions in languages like Greek?

Hi all,

I am having an issue in distinguishing questions from non-questions. For example, in english you can say “Is it good?” and “It is good.”. The same statements in Greek are “Είναι καλό;” & “Είναι καλό.”. They have the same word order, the only difference is the question mark (";" in greek).

So, I am wondering what is happening in the tokenizer. I am using the LanguageModelTokenizer, is there a way (i.e. in rasa shell) to see how the tokenizer tokenises the sentence, and if the questionmark is kept?

I did the “regular” stuff, of adding print to files in rasa, and I saw the following:

The tokenizer of the language model does not see the initial message. It is first passed from WhitespaceTokenizer, which removes all punctuation. So, my “;” does not survive, and rest of the classifiers cannot distinguish questions from their non-question equivalents.