For my chatbot project, I want to use Rasa but I have to build the chatbot in Bangla language. I’m not sure what label of customization I need to make it possible.
I have huge amount of unstructured data, but how to process this data for chatbot as intents and entities all will be in Bangla?
Using Rasa for building a chatbot in Bangla language should be doable. Assuming you have some basic knowledge around how to build a chatbot with Rasa. If not please let me know or look into Rasa Tutorial. You need to do two things:
(1) Create training data from you unstructured data. See Training Data Format for the NLU data format.
(2) Use the supervised_embeddings pipeline (see Choosing a Pipeline). This pipeline is independent of the language.
(3) The above pipeline uses the WhitespaceTokenizer. I don’t know the language, so it might be that the tokenizer does not perform well on it. A tokenizer splits an incoming message into individual words/tokens. If the WhitespaceTokenizer turns out to be not working well, you might want to create your own (see Custom NLU Components).