Support for code mixed languages?

aashishgangwan1 · August 9, 2018, 10:54am

Hey, I was wondering if we can have Rasa NLU to identify code mixed languages like Singlish? Reference: https://en.wikipedia.org/wiki/Singlish

nmstoker · August 11, 2018, 10:59am

Have you looked here? (“always check the docs first!!” ) https://rasa.com/docs/nlu/languages/ There’s some detail there that would be relevant.

It might be worth having a go with the tensorflow_embedding back end as although it doesn’t mention code mixed input, the documentation does mention this:

The tensorflow_embedding pipeline can be used for any language, because it trains custom word embeddings for your domain.

Since this approach doesn’t use pretrained language data you might be okay, although bear in mind that words you want it to learn would need to be well represented in your training data (it’s not magic!)

If you wanted to try the other backends, I think you’d find it much harder. Whist the spacey backend can work with different pre-trained word vectors, you’d need to source a Singlish one yourself (unless someone made one already)

Finally, it’s only tangentially relevent, but I did spot this paper about creating code mixed datasets: [1806.05997v1] A Dataset for Building Code-Mixed Goal Oriented Conversation Systems

asimzaman · August 11, 2018, 7:35pm

@nmstoker Its not good to place my question here but I need urgent response, how can I define intents for QnA chatbot for LMS, like student can ask question on subject of Computer Science etc???

aashishgangwan1 · August 12, 2018, 4:56am

Thanks!

aashishgangwan1 · August 12, 2018, 4:57am

How about one intent per question? I am not sure but that could be a way.

nmstoker · August 12, 2018, 10:58am

@asimzaman you could try an intent per question as @aashishgangwan1 suggests, it’ll be fine for a narrow set of known questions, but if you want it to scale then that’s where it is awkward (you’ll need to add new intents for each question)

It sounds like you’re under time pressure so this may be a little ambitious, but tools like DrQA are looking into generalisable approaches that search a knowledge base. A term to Google on this field is KBQA (Knowledge Base QnA). The DrQA repo is here: GitHub - facebookresearch/DrQA: Reading Wikipedia to Answer Open-Domain Questions

The sort of thing you could possibly do in a more limited timeframe using Rasa might be to create intents for your broad categories of questions plus off-road topic questions. Then for the on-topic ones use user keywords to search a relevant knowledge base (eg stick your questions and answers into something like ElasticSearch or even a sqlite dB)

As a side note, stressing the urgency doesn’t tend to make people more likely to respond in forums (unless they know you or it’s live threatening! )

Best of luck with working on a solution - I’m sure others would like to know what you manage to create, so once you’ve made progress, why not post it on the Projects section?

nmstoker · August 12, 2018, 11:02am

Also there’s a video for a project I did a while back that may be slightly relevant for both of you (in different ways) https://youtu.be/xSN5fY5uYYg

It handles language detection (but sadly not code mixed language!) and then categories a variety of questions into brief topics (in this case academic subjects, but for @asimzaman it could be sub-groups of questions). There’s no “answering” backend, so it’s not helping you there @asimzaman but it might give some insight with intent examples

The code is on GitHub too

Topic		Replies	Views
Rasa NLU without Rasa Core Getting Started with Rasa confidence	4	146	August 23, 2019
SUPERVISED EMBEDDING Rasa Open Source	2	974	May 15, 2019
Cannot Understand All Banglish Intent in FAQ Rasa Open Source	1	264	February 8, 2021
Intents do not obtain NLU threhold because of domain specificity chatbot Rasa Open Source	2	561	December 24, 2018
Rasa does not do well in identifying intents that are close to each other Rasa Open Source	3	484	March 12, 2024

Support for code mixed languages?

Related Topics