Can RASA-NLU be used for Sinhala Language? If needed what should be done?
Hey @uthpala-era. Yes, I know some people from the community who successfully built a bot with Rasa in Sinhala language. To do so, you should use Tensorflow Embedding pipeline which allows you to build bots with Rasa in any language that can be tokenized (Sinhala in your case). You can read more about this pipeline here.
To implement your bot, you should follow the regular implementation process: create training data for NLU and Core models (NLU examples should, of course, be in your chosen language), define the tensorflow embedding pipeline and use it to train the NLU model. The process of training the Rasa Core model is no different than any other language.
Thank you @Juste , I have tried an example for creating a chatbot for English language. It worked nicely. And the problem is when i switched to Sinhala. In my config.json i’ve used pipeline as ‘spacy_sklearn’ . And also I have data.json with data in sinhala language. And in templates section of domain.yml , I have included sinhala text responses. However due to these unicode characters, when I try to train the data, I get an error as shown below. <CoreError: error code 3: Unable to load any data from source yaml file: Path ‘/’. So that I can’t continue. Could u pls guide me to get through this?
Hello @uthpala-era. I wouldn’t recommend using spacy because there are no word vectors for Sinhala language yet. Instead, I would suggest you using the tensorflow_embedding pipeline which allows you to build assistants regardless of the language. Regarding the error you are getting - do you have intetns, entities or action names which contain Sinhala characters?
Thanks again for responding @Juste . I see…I am building this chatbot for a research. That is why i tried to use spacy_sklearn pipeline. ‘Building Word Vectors’ won’t this be possible for me?
And I do not have intents, entities or action names with Sinhala characters.
my sample domain.yml file is like this.
- text: “හායි”
- text: “නැවත හමුවෙමු”
Is this exactly how your domain file is formatted? If yes, then the issue is because of the formatting - utter_goodbye is outside the templates section. Can you double check that? Here is how a domain should be formatted.
@Juste …Fortunately after several tries it worked for me. It could be a formatting issue. Thanks for the guidance. And Just today I realised the vedio tutorial I had followed was yours, “Creating a chatbot with Rasa NLU and Rasa Core” , I should mention it was very helpful to me & I followed it. Great work.
Glad to hear that! Sometimes it’s just a space in the wrong place or an indetnatation that makes it break, but I am glad you solved the issue. Also happy to hear that the tutorial was helpful!
Isn’t there any possibility for us to develop the word vectors for other languages(in my case i want to develop word vectors for sinhala language) ?
Thanks a lot.I will try those out.
@Juste Could you please kindly help me on another issue. I am now using Rasa,the latest version(not Rasa X). Thanks to you guys it is pretty simple now, we can train using rasa train and run bot using, rasa run So i have integrated my bot with slack. When i type simple greet messages it nicely replies. But when there exist an entity , some times the bot stucks there. So is there any mechanism for the bot developer to check what is the error caused. Could you pls simly tell me where the error log is?
If you can reply in yr earliest i am very grateful.Thanks.
sir, could you tell me as you told that you made a chatbot with sinhala language what kind of changes you made in rasa structure bcoz i want to make a chatbot with hindi language so i want to know what sort of changes i need to make in rasa to make understand and give response in hindi language. please respond.
sir could you guide me how to make rasa based chatbot with hindi language having both input and output as hindi phrases. please respond me if you have any idea.
Link to bot:
- 좋은 아침
- 좋은 저녁
- 주위에 당신을보고
- 나중에 봐
- 그 좋은 소리
- 난 그렇게 생각하지 않아
- 좋아하지 않아
- 절대 안돼
- 아주 좋아
- 기분이 아주 좋아
- 나는 잘 지내고있어
- 난 괜찮아
- 너무 슬퍼
- 매우 나쁘다
- 아주 좋은하지
- 매우 슬프다
- 너무 슬퍼
- 당신은 봇입니까?
- 당신은 인간입니까?
- 봇과 대화하고 있습니까?
- 인간과 대화하고 있습니까?
use these line to test it.
Read this to build your own:
Hi @athenasaurav, can you share the pipeline from your config.yml.
Here is my configuration file:
language: kr pipeline: - name: WhitespaceTokenizer - name: RegexFeaturizer - name: LexicalSyntacticFeaturizer - name: CountVectorsFeaturizer - name: CountVectorsFeaturizer analyzer: "char_wb" min_ngram: 1 max_ngram: 4 - name: DIETClassifier epochs: 100 - name: EntitySynonymMapper - name: ResponseSelector epochs: 100 policies: - name: KerasPolicy epochs: 200 max_history: 3 - name: MemoizationPolicy max_history: 3
thank you for this. I’m building a chatbot in Urdu language. How do I check if Urdu is supported?
@athenasaurav thank you very much for the reply. But the link you provided there is not working.