Rasa X Chat UI Hangs + Trouble understanding docker-compose logs errors

I got custom actions to work on Rasa X, but the chat hangs for every input. What is odd though is that once I refresh the web browser and go to the conversations tab, I notice the the output is generally correct, and the API for the custom action is called successfully.

I checked out the docker-compose logs and found errors in three parts: (1) rasa_x, (2) rasa_production, and (3) rasa_worker. Here are the error logs:

rasa_x

rasa-x_1           | Starting Rasa X server... 🚀
rasa-x_1           | INFO:rasax.community.services.integrated_version_control.git_service:Cloning git repository from URL 'git@github.com:slcheungcasado/rasa-medbot.git'.
rasa-x_1           | Exception occurred while handling uri: 'http://34.92.254.220/api/projects/default/git_repositories/2/status'
rasa-x_1           | Traceback (most recent call last):
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/sanic/app.py", line 976, in handle_request
rasa-x_1           |     response = await response
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/api/decorators.py", line 204, in decorated_function
rasa-x_1           |     return await await_and_return_response(args, kwargs, request)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/api/decorators.py", line 134, in await_and_return_response
rasa-x_1           |     response = await response
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/api/blueprints/git.py", line 151, in get_repository_status
rasa-x_1           |     repository_status = git_service.get_repository_status()
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/services/integrated_version_control/git_service.py", line 716, in get_repository_status
rasa-x_1           |     is_remote_ahead = self.is_remote_branch_ahead()
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/services/integrated_version_control/git_service.py", line 553, in is_remote_branch_ahead
rasa-x_1           |     number_of_commits_behind = sum(1 for _ in commits_behind)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/services/integrated_version_control/git_service.py", line 553, in <genexpr>
rasa-x_1           |     number_of_commits_behind = sum(1 for _ in commits_behind)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/git/objects/commit.py", line 277, in _iter_from_process_or_stream
rasa-x_1           |     finalize_process(proc_or_stream)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/git/util.py", line 328, in finalize_process
rasa-x_1           |     proc.wait(**kwargs)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/git/cmd.py", line 408, in wait
rasa-x_1           |     raise GitCommandError(self.args, status, errstr)
rasa-x_1           | git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
rasa-x_1           |   cmdline: git rev-list master..origin/master --
rasa-x_1           |   stderr: 'fatal: bad revision 'master..origin/master'
rasa-x_1           | '

rasa_production

rasa-production_1  | 2020-03-24 08:20:41.006943: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:20:41.007349: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:20:41.007393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-production_1  | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.8', 36088)>; error=111 (Connection refused)
rasa-production_1  | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-production_1  | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-production_1  | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.8', 36104)>; error=111 (Connection refused)
rasa-production_1  | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-production_1  | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-production_1  | /opt/venv/lib/python3.6/site-packages/rasa/utils/common.py:347: UserWarning: Interpreter parsed an intent 'hi' which is not defined in the domain. Please make sure all intents are listed in the domain.
rasa-production_1  |   More info at https://rasa.com/docs/rasa/core/domains/
rasa-production_1  | 2020-03-24 08:25:02 ERROR    rasa.core.brokers.pika  - Could not open Pika channel at host 'rabbit'. Failed with error: Channel is closed.
rasa-production_1  | 2020-03-24 08:25:21.332683: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
rasa-production_1  | 2020-03-24 08:33:26.038624: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:33:26.040372: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:33:26.040535: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-production_1  | 2020-03-24 08:33:53.316035: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

rasa_worker

rasa-worker_1      | 2020-03-24 08:20:40.652898: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:20:40.653426: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:20:40.653472: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-worker_1      | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.7', 36370)>; error=111 (Connection refused)
rasa-worker_1      | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-worker_1      | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-worker_1      | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.7', 36392)>; error=111 (Connection refused)
rasa-worker_1      | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-worker_1      | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-worker_1      | 2020-03-24 08:22:40.292134: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
rasa-worker_1      | /opt/venv/lib/python3.6/site-packages/rasa/core/policies/ensemble.py:310: FutureWarning: 'KerasPolicy' is deprecated and will be removed in version 2.0. Use 'TEDPolicy' instead.
rasa-worker_1      |   policy_object = constr_func(**policy)
rasa-worker_1      | /opt/venv/lib/python3.6/site-packages/rasa/nlu/config.py:50: FutureWarning: You are using a pipeline template. All pipelines templates are deprecated and will be removed in version 2.0. Please add the components you want to use directly to your configuration file.
rasa-worker_1      |   return RasaNLUModelConfig(config)
rasa-worker_1      | /opt/venv/lib/python3.6/site-packages/rasa/utils/common.py:347: UserWarning: 'CRFEntityExtractor' is deprecated and will be removed in version 2.0. Use 'DIETClassifier' instead.
rasa-worker_1      |   More info at https://rasa.com/docs/rasa/migration-guide/
rasa-worker_1      | 2020-03-24 08:22:51 WARNING  rasa.nlu.classifiers.diet_classifier  - Please configure the number of 'epochs' in your configuration file. We will change the default value of 'epochs' in the future to 1. 
rasa-worker_1      | 2020-03-24 08:33:26.035055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:33:26.035448: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:33:26.035492: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-worker_1      | 2020-03-24 08:33:53.316035: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

The rasa_x logs indicate something is wrong with the continuous integration with my github repo, but it pulled everything just fine.

Both the rasa_production and rasa_worker error mention some connection refusal and something to do with tensorflow. It also mentions something about GPU usage, but based on the GCP documentation, the zone I chose for my VM instance asia-east2-b doesn’t have GPUs available (also double checked on my instances’ machine configurations).

Edit:

Virtual Machine Details:

  • n1-standard-2 (2 vCPUs, 7.5 GB memory)

  • 100 GB Disk

  • Ubuntu 18

RASA_X_VERSION=0.26.1

RASA_VERSION=1.8.0

RASA_X_DEMO_VERSION=0.26.0

I don’t think the mismatched RASA_X_VERSION versus RASA_X_DEMO_VERSION matters since RASA_X_DEMO_VERSION is only used in the app service for the docker-compose.yml and that is being overridden for the custom action server image.

Python 3.6.9

Docker version 19.03.8, build afacb8b7f0

docker-compose version 1.25.4, build unknown

Some guidance would be greatly appreciated.

I have managed to narrow down the log to these two messages:

rasa-worker_1      | 2020-03-24 10:44:43.737337: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

Occurs when I first press on train

rasa-production_1  | 2020-03-24 10:48:22.457976: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

Meanwhile this occurs when I first send a message using the interactive learning UI.

I found that I need to press train once again to get Rasa X to actually train a model. Likewise with any interactive learning conversation, the first message ends up being eaten up, but once you send a second message onward the bot behaves as intended.

I will be setting up another VM instance in a zone that has GPU available to see if this makes any difference, but I’m just guessing at this point. :pensive:

Hi @slcheungcasado , I’ve seen these types of errors too

E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

But I haven’t yet seen it actually cause an error. Does the talk to your assistant page always ignore the first message, or does it work if you refresh the page before starting/give it a few seconds first?

I tried giving it about a minute or two, but that doesn’t seem to make a difference. It seems to work if I train a new model, set that model as active, and then I refresh the page prior to using talk to your assistant.

On another note, if I happen to stop the server and start it back up the chat UI behaves worse. The bot doesn’t reply back at all (but again if I refresh, the conversation recorded is correct). This behavior goes away if I train a new model set that as active, refresh and then chat with the newly trained bot.

Thanks for the feedback. I’ve seen similar behavior but not sure what causes it. I’ll pass this on as feedback to the development team :+1:

Re. the chat hanging issue - this is resolved in rasa-x==0.27

1 Like

@mloubser Hi , it seems i got the issue too , the chat UI hangs and the chatbot doesn’t reply ,( rasa x 0.27.7):

rasa-production_1 | /opt/venv/lib/python3.6/site-packages/rasa/utils/common.py:351: UserWarning: Interpreter parsed an intent 'hey' which is not defined in the domain. Please make sure all intents are listed in the domain.

rasa-production_1 | 2020-04-28 04:12:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

rasa-worker_1 | 2020-04-28 04:12:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

Also why does this show up always in the logs :

rasa-worker_1 | 2020-04-28 04:23:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

rasa-production_1 | 2020-04-28 04:23:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

it seems like @slcheungcasado already stated , it does work when training a new model and refreshing the page , but still if i update the rasa x version again the chat will hang again

@pandaxar when updating rasa-x, you will usually have to re-train the model if your model is incompatible with the new version of rasa. It looks like that is what the error is pointing to as well.