Data format of Rasa for Arabic

merveenoyan · June 9, 2021, 12:55pm

Hi there,

I’m using Rasa 1.10.24. I have to make the same chatbot (we have in English) for Arabic. I’m using DIET Classifier, Whitespace Tokenizer, count vectors and language ar. Because I can’t read Arabic and it’s written in a different way, I don’t know how entity annotations work. I looked at nlu files in the Rasa Arabic PoCs, I thought all of the annotations would be like entity value but it doesn’t seem to be this way, and we are directly using google’s translation API to translate the nlu data. I get errors like these when I train NLU:

rasa/utils/common.py:387: UserWarning: Misaligned entity annotation in message ‘ما هي العلامات المبكرة لخلل التنسج الصدري’ with intent ‘user_inform_health’. Make sure the start and end values of entities in the training data match the token boundaries (e.g. entities don’t include trailing whitespaces or punctuation).

Can someone inform me on how entity annotations should take place?

koaning · June 10, 2021, 7:21am

That’s strange.

Is there a snippet of the NLU file that you can pass along to me so I might be able to reproduce the error?

merveenoyan · June 10, 2021, 7:37am

Sure, will ping you from slack.

merveenoyan · June 10, 2021, 4:03pm

If anyone’s looking at this thread, turns out that trying to work with RTL languages in code editors like VSCode is quite problematic that it’s better if you have someone in the team that knows Arabic + having them use this tool (that credits go to Rasa Arabic user group) > Rasa Arabic Annotation Helper to annotate your entities is the solution.

Topic		Replies	Views
Warning for arabic annotation during training Rasa Open Source	0	324	March 11, 2022
[HELP NEEDED] Misaligned entity annotation in message Rasa Open Source	6	1825	September 13, 2022
Misaligned entity annotation Rasa Open Source	7	4609	June 3, 2020
Misaligned entity annotation in message Rasa Open Source	1	1025	July 7, 2020
Slot containing .net Rasa Open Source	4	540	May 7, 2021

Data format of Rasa for Arabic

Related topics