HI, I’ve been searching for over a week now and I have a few config files for Arabic. I keep trying to use them but they don’t seem to work for me. I’ve tried using SpaCy and I think there is quite a few things I might be understanding wrong. I would like to start by asking what libraries should I install. Also can I am using Google Colab, not that I think it makes a difference. Please help me.
Hi Israa,
my name is Vincent. I’m working on a library that contains NLU components that should make it easier for non-English users to make assistants with Rasa. The project is called rasa nlu examples
and it can be found here; Rasa NLU Examples.
I don’t speak Arabic but I’d love learn more about the problems that you’re encountering. My guess is that there might be something going wrong at the tokenisation part of your project. A typical machine learning pipeline for text involves taking the input text, turning it into tokens and then converting that to numeric features for a machine learning model.
In English this is usually easy because you can often just split a word using the space character " "
.
This “whitespace tokenizer” flow works well for a lot of languages that are similar to English but it might not work as well for Arabic. I can’t speak Arabic unfortunately, but I’m trying to support for tools to rasa nlu examples cover non-English languages like Arabic.
In particular I’ve added support for the following tools for Arabic;
- the stanza tokeniser
- the fasttext and bytepair embeddings
If you can share a config.yml
pipeline that you’ve tried locally and perhaps part of your nlu.yml
file then I might be able to help you in more detail.
Also, I’ll be speaking at PyData Riyadh next week. It’s a virtual event but I’ll present some of the tools that I am working on and it might also have an opportunity to exchange ideas.
@koaning For starters thank you for replying. Second I’ve found a code on github and ran it on the laptop using CMD and it worked fine. However I’m working on Colab so I need it to work there. The problem is importing the tokenisation part through colab. This is the link for the github: GitHub - RedaElmar/CovidBot-Telegram: a conversational agent in Arabic Darija called CovidBot which aims to inform Moroccans about the evolution of the CoronaVirus pandemic in the kingdom, and also to help them better understand this virus, how to protect themselves from it and how to fight it. This is the Colab file I am trying to implement: Copy_of_Call_Center_Arabic_Bot.ipynb (55.5 KB)
Rasa is meant to be run/trained from the command line. There are lots of configuration files that need to be tracked so the notebook environment is probably not going to be ideal. That said, the configuration file looks right to me so if you run it all from the command line it should work. Is there a reason why you are running it from Colab?
The meetup I mentioned is now scheduled and can be found here.
@koaning I am working on Colab because my Nvidia Card on the laptop is 820M and from my extensive research I did not find a way to setup up Tensorflow GPU. I figured out that I am supposed to configure the tokens through the Rasa-NLU Registry file in order for it to work using Colab so now I am trying to figure out a way to do that, if you have any suggestions please let me know.
You don’t need a GPU for Rasa to be honest. In fact; I’ve never trained Rasa with a GPU. I run all of my Rasa workloads on an intel NUC that only has 6 CPU cores.
It depends slightly on your dataset size perhaps though. How many intents/entities/examples do you have?
@koaning It finally worked turns out the issue was with using Rasa_NLU and Rasa_Core. This is what I had to import to get it to work.
Install Rasa NLU and Rasa Core
!{python} -m pip install -U rasa;
!{python} -m pip install sklearn_crfsuite;
!{python} -m pip install tensorflow;
!pip install -U spacy;
import rasa #rasa =2.1.0
import spacy
import pandas
import sklearn_crfsuite
print("rasa: {}".format(rasa.__version__)) #to see the version of rasa being used
Training the Rasa NLU Model
from rasa.nlu.train import load_data
from rasa.nlu.config import RasaNLUModelConfig
from rasa.nlu.model import Trainer
from rasa.nlu import config # To import your config file, should be called directly above the rest of the NLU training part of the code and not at the beginning of your code
from rasa.nlu.test import run_evaluation
Training the Domain Model
from rasa.core.policies.policy import *
Start up the bot
from rasa.core.interpreter import *
Config used
config = '''
# https://rasa.com/docs/rasa/nlu/components/
language: ar
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
'''
%store config > config.yml
Now, however, I’m facing a problem with from rasa.core.agent import Agent
if you have any suggestions please let me know and if I figure it out I will also send the correct version.
I am also facing the same during the import of Agent
let me know if you found a solution.
I’ve ask here Rasa.core.agent import Agent Issue. I’ll definitely keep you update, same goes to you.
are you found any solution about using arabic chatbot with rasa ?
I need to support both Arabic and English for the same chatbot. any advice on the configs for nlu?