Empty entities being returned by rasa nlu

hijab10 · March 23, 2019, 1:51pm

I’m trying to develop a chat-bot in the Urdu language using rasa_nlu. My model classifies the intents correctly but fails to extract entities. I’m using python 3.7.2 and my rasa nlu version is 0.14.6 on a Windows 10 machine.

I’ve made sure that the format of the training data is correct i.e. the start and end positions of the entities are correct. I can’t seem to figure out the issue.

The following are the contents of my config.yml file: language: “ur”

pipeline:

name: “tokenizer_whitespace”
name: “ner_crf”
name: “ner_synonyms”
name: “intent_featurizer_count_vectors”
name: “intent_classifier_tensorflow_embedding”

For reference following is an example of what my traning data looks like: { “rasa_nlu_data”: { “common_examples”: [ { “intent”: “پوچھنا”, “entities”: [ { “end”: 16, “entity”: “نام”, “start”: 12, “value”: “عامر” } ], “text”: “اسلام علیکم عامر” } ]}}

erohmensing · April 1, 2019, 6:52am

Hi @hijab10, have you tried running on python 3.6? Rasa NLU doesn’t yet support python 3.7 because there’s not yet a rasa-supported version of tensorflow for it. I’m actually surprised you’re not getting import errors if you’re using the tensorflow intent classifier in your pipeline. What version of tensorflow are you running?

hijab10 · April 1, 2019, 12:28pm

Hey @erohmensing thank you for responding. I’m using tensorflow 1.13.1 and I was actually able to resolve my issue by changing the entity name to English whilst keeping the value in Urdu. I don’t know why it works this way but I’m not complaining. And no I didn’t get any import errors. Although, when I ran the same configuration in a docker instance, I was getting the “Unicode error”, I guess due to Urdu being utf-8 encoded. This is what my training data looks like now: { “rasa_nlu_data”: { “common_examples”: [ { “intent”: “greet”, “entities”: [ { “end”: 16, “entity”: “name”, “start”: 12, “value”: “عامر” } ], “text”: “اسلام علیکم عامر” } ]}}

erohmensing · April 1, 2019, 2:05pm

Okay i’m glad it’s working! I think that 1.13.1 actually does work with 3.7, we’re working on getting 3.7 support because of that (i don’t think 1.13.0 did).

The unicode error sounds strange, I believe we stopped supporting stuff that isn’t utf-8 encoded… would you mind posting the error?

hijab10 · April 6, 2019, 8:31am

Sorry for the late response. I think it is some issue with pycrfsuite. The error I was getting:

“UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 2-4: ordinal not in range(128)”

The enire traceback is as follows: File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\runpy.py”, line 193, in _run_module_as_main “main”, mod_spec) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\runpy.py”, line 85, in _run_code exec(code, run_globals) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\site-packages\rasa_nlu\train.py”, line 184, in num_threads=cmdline_args.num_threads) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\site-packages\rasa_nlu\train.py”, line 154, in do_train interpreter = trainer.train(training_data, **kwargs) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\site-packages\rasa_nlu\model.py”, line 196, in train **context) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\site-packages\rasa_nlu\extractors\crf_entity_extractor.py”, line 142, in train self._train_model(dataset) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\site-packages\rasa_nlu\extractors\crf_entity_extractor.py”, line 551, in _train_model self.ent_tagger.fit(X_train, y_train) File “C:\Users\hijab\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn_crfsuite\estimator.py”, line 321, in fit trainer.append(xseq, yseq) File “pycrfsuite_pycrfsuite.pyx”, line 312, in pycrfsuite._pycrfsuite.BaseTrainer.append File “stringsource”, line 48, in vector.from_py.__pyx_convert_vector_from_py_std_3a__3a_string File “stringsource”, line 15, in string.from_py.__pyx_convert_string_from_py_std__in_string UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 2-4: ordinal not in range(128)

Talhag958 · April 23, 2020, 9:27pm

@hijab10 Dear if you are done with urdu bot, can you share the samle my mail-> talhaakram958@gmail.com

Topic		Replies	Views
NLU 0.14.4 with tensorflow 1.12.0 unable to extract entities Rasa Open Source	2	622	March 18, 2019
Rasa-NLU not extracting entities Getting Started with Rasa paula	1	190	May 14, 2019
Data format of Rasa for Arabic Rasa Open Source	3	684	June 10, 2021
Why am i not able to get entities that are not trained? Rasa Open Source	56	2005	June 26, 2020
I'm trying to do research about arabic in rasa nlu with tensorflow Rasa Open Source	8	2302	February 23, 2021

Empty entities being returned by rasa nlu

Related topics