Rasa NLU Server Not Stable

Hello.

We are trying out rasa and while we got everything up and running, we find that Rasa NLU server dies a lot and is generally not stable. Core is a little better. As an example last error that occurred after a training job is below. Training was requested through the API and 8 mins later the follow error in log reflected. 2019-01-31 19:43:04-0500 [-] 2019-01-31 19:43:04 WARNING rasa_nlu.data_router - [Failure instance: Traceback (failure with no frames): <class ‘twisted.internet.defer.CancelledError’>: 2019-01-31 19:43:04-0500 [-] ] 2019-01-31 19:43:04-0500 [_GenericHTTPChannelProtocol,2773,174.89.234.171] Unhandled Error Processing Request. Traceback (most recent call last): File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/internet/defer.py”, line 501, in errback self._startRunCallbacks(fail) File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/internet/defer.py”, line 568, in _startRunCallbacks self._runCallbacks() File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/internet/defer.py”, line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/internet/defer.py”, line 1475, in gotResult _inlineCallbacks(r, g, status) — — File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/internet/defer.py”, line 1416, in _inlineCallbacks result = result.throwExceptionIntoGenerator(g) File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/python/failure.py”, line 491, in throwExceptionIntoGenerator return g.throw(self.type, self.value, self.tb) File “/home/ubuntu/.local/lib/python3.5/site-packages/rasa_nlu/server.py”, line 362, in train RasaNLUModelConfig(model_config), model_name) File “/usr/local/lib/python3.5/dist-packages/Twisted-18.9.0-py3.5-linux-x86_64.egg/twisted/internet/defer.py”, line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File “/home/ubuntu/.local/lib/python3.5/site-packages/rasa_nlu/data_router.py”, line 356, in training_errback failure.value.failed_target_project) builtins.AttributeError: ‘CancelledError’ object has no attribute ‘failed_target_project’

OS: Ubuntu 16.04 Python 3.5 Rasa NLU 0.13.8 Pipeline: TensorFlow

It seems that training jobs running was showing as 1 and max training allowed was also 1. So as a result no more training could be done until NLU process was killed and restarted.

Is there a better level of logging that’s available and what does the above error mean? Why did it kill the whole process?

Thank you

I would like to update this topic because I’m facing the same issue.

DId you find a solution ?

Hi Romain. No I didn’t. We cut down our use of API training for now and do all training from command line. Seems like we will have to debug/investigate rasa nlu train api code ourselves and not sure if this forum is monitored well.

Hey @Serge. Apologies for late response on this. Just to understand the issue better - how much training data are you using for training?

Hi @Juste. It’s interesting you asked this question right way. We did notice that the initial files with started with worked, but when they get bigger that’s when we find issues on the API (e.g. via postman). Currently my NLU training file is 146kb (in JSON format). Is there a limit to when the API should/should not be used? Or is there a parameter I can pass to the nlu.server? Any feedback will help. Thank you Serge

Hey @Serge. We have seen some issues from the community on that end too. However, your training data sample still seems to be on a small side which shouldn’t cause the model to work slow and therefore the server to time out. Let me think of other possible reasons for this.

Continuing the discussion from Rasa NLU Server Not Stable:

Continuing the discussion from Rasa NLU Server Not Stable:

HI I am getting same error… Issue i have seen was i have added my own pipeline like below. error : 2019-11-14 12:50:28+0000 [-] 2019-11-14 12:50:28 WARNING rasa_nlu.data_router - [Failure instance: Traceback (failure with no frames): <class ‘twisted.internet.defer.CancelledError’>:

pipeline : [{ “name”: “nlp_mitie”, “model”: “/app/data/mitie/total_word_feature_extractor.dat” }, { “name”: “tokenizer_mitie” }, { “name”: “ner_mitie” }, { “name”: “ner_synonyms” }, { “name”: “intent_entity_featurizer_regex” }, { “name”: “intent_featurizer_mitie” }, { “name”: “intent_classifier_sklearn”, “C”: [1, 2, 5, 10, 20, 100], “kernel”: “linear” } ]

Does anyone know how to fix this? I am also seeing the same error and could not figure out how to resolve it