Rasa X Chat UI Hangs + Trouble understanding docker-compose logs errors

I got custom actions to work on Rasa X, but the chat hangs for every input. What is odd though is that once I refresh the web browser and go to the conversations tab, I notice the the output is generally correct, and the API for the custom action is called successfully.

I checked out the docker-compose logs and found errors in three parts: (1) rasa_x, (2) rasa_production, and (3) rasa_worker. Here are the error logs:

rasa_x

rasa-x_1           | Starting Rasa X server... 🚀
rasa-x_1           | INFO:rasax.community.services.integrated_version_control.git_service:Cloning git repository from URL 'git@github.com:slcheungcasado/rasa-medbot.git'.
rasa-x_1           | Exception occurred while handling uri: 'http://34.92.254.220/api/projects/default/git_repositories/2/status'
rasa-x_1           | Traceback (most recent call last):
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/sanic/app.py", line 976, in handle_request
rasa-x_1           |     response = await response
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/api/decorators.py", line 204, in decorated_function
rasa-x_1           |     return await await_and_return_response(args, kwargs, request)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/api/decorators.py", line 134, in await_and_return_response
rasa-x_1           |     response = await response
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/api/blueprints/git.py", line 151, in get_repository_status
rasa-x_1           |     repository_status = git_service.get_repository_status()
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/services/integrated_version_control/git_service.py", line 716, in get_repository_status
rasa-x_1           |     is_remote_ahead = self.is_remote_branch_ahead()
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/services/integrated_version_control/git_service.py", line 553, in is_remote_branch_ahead
rasa-x_1           |     number_of_commits_behind = sum(1 for _ in commits_behind)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/rasax/community/services/integrated_version_control/git_service.py", line 553, in <genexpr>
rasa-x_1           |     number_of_commits_behind = sum(1 for _ in commits_behind)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/git/objects/commit.py", line 277, in _iter_from_process_or_stream
rasa-x_1           |     finalize_process(proc_or_stream)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/git/util.py", line 328, in finalize_process
rasa-x_1           |     proc.wait(**kwargs)
rasa-x_1           |   File "/usr/local/lib/python3.6/site-packages/git/cmd.py", line 408, in wait
rasa-x_1           |     raise GitCommandError(self.args, status, errstr)
rasa-x_1           | git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
rasa-x_1           |   cmdline: git rev-list master..origin/master --
rasa-x_1           |   stderr: 'fatal: bad revision 'master..origin/master'
rasa-x_1           | '

rasa_production

rasa-production_1  | 2020-03-24 08:20:41.006943: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:20:41.007349: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:20:41.007393: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-production_1  | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.8', 36088)>; error=111 (Connection refused)
rasa-production_1  | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-production_1  | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-production_1  | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.8', 36104)>; error=111 (Connection refused)
rasa-production_1  | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-production_1  | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-production_1  | /opt/venv/lib/python3.6/site-packages/rasa/utils/common.py:347: UserWarning: Interpreter parsed an intent 'hi' which is not defined in the domain. Please make sure all intents are listed in the domain.
rasa-production_1  |   More info at https://rasa.com/docs/rasa/core/domains/
rasa-production_1  | 2020-03-24 08:25:02 ERROR    rasa.core.brokers.pika  - Could not open Pika channel at host 'rabbit'. Failed with error: Channel is closed.
rasa-production_1  | 2020-03-24 08:25:21.332683: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
rasa-production_1  | 2020-03-24 08:33:26.038624: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:33:26.040372: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-production_1  | 2020-03-24 08:33:26.040535: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-production_1  | 2020-03-24 08:33:53.316035: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

rasa_worker

rasa-worker_1      | 2020-03-24 08:20:40.652898: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:20:40.653426: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:20:40.653472: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-worker_1      | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.7', 36370)>; error=111 (Connection refused)
rasa-worker_1      | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-worker_1      | 2020-03-24 08:20:45 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-worker_1      | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_INET, type=2049, proto=6, laddr=('172.19.0.7', 36392)>; error=111 (Connection refused)
rasa-worker_1      | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.19.0.4', 5672))
rasa-worker_1      | 2020-03-24 08:20:50 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
rasa-worker_1      | 2020-03-24 08:22:40.292134: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
rasa-worker_1      | /opt/venv/lib/python3.6/site-packages/rasa/core/policies/ensemble.py:310: FutureWarning: 'KerasPolicy' is deprecated and will be removed in version 2.0. Use 'TEDPolicy' instead.
rasa-worker_1      |   policy_object = constr_func(**policy)
rasa-worker_1      | /opt/venv/lib/python3.6/site-packages/rasa/nlu/config.py:50: FutureWarning: You are using a pipeline template. All pipelines templates are deprecated and will be removed in version 2.0. Please add the components you want to use directly to your configuration file.
rasa-worker_1      |   return RasaNLUModelConfig(config)
rasa-worker_1      | /opt/venv/lib/python3.6/site-packages/rasa/utils/common.py:347: UserWarning: 'CRFEntityExtractor' is deprecated and will be removed in version 2.0. Use 'DIETClassifier' instead.
rasa-worker_1      |   More info at https://rasa.com/docs/rasa/migration-guide/
rasa-worker_1      | 2020-03-24 08:22:51 WARNING  rasa.nlu.classifiers.diet_classifier  - Please configure the number of 'epochs' in your configuration file. We will change the default value of 'epochs' in the future to 1. 
rasa-worker_1      | 2020-03-24 08:33:26.035055: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:33:26.035448: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
rasa-worker_1      | 2020-03-24 08:33:26.035492: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
rasa-worker_1      | 2020-03-24 08:33:53.316035: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

The rasa_x logs indicate something is wrong with the continuous integration with my github repo, but it pulled everything just fine.

Both the rasa_production and rasa_worker error mention some connection refusal and something to do with tensorflow. It also mentions something about GPU usage, but based on the GCP documentation, the zone I chose for my VM instance asia-east2-b doesn’t have GPUs available (also double checked on my instances’ machine configurations).

Edit:

Virtual Machine Details:

  • n1-standard-2 (2 vCPUs, 7.5 GB memory)

  • 100 GB Disk

  • Ubuntu 18

RASA_X_VERSION=0.26.1

RASA_VERSION=1.8.0

RASA_X_DEMO_VERSION=0.26.0

I don’t think the mismatched RASA_X_VERSION versus RASA_X_DEMO_VERSION matters since RASA_X_DEMO_VERSION is only used in the app service for the docker-compose.yml and that is being overridden for the custom action server image.

Python 3.6.9

Docker version 19.03.8, build afacb8b7f0

docker-compose version 1.25.4, build unknown

Some guidance would be greatly appreciated.

I have managed to narrow down the log to these two messages:

rasa-worker_1      | 2020-03-24 10:44:43.737337: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

Occurs when I first press on train

rasa-production_1  | 2020-03-24 10:48:22.457976: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

Meanwhile this occurs when I first send a message using the interactive learning UI.

I found that I need to press train once again to get Rasa X to actually train a model. Likewise with any interactive learning conversation, the first message ends up being eaten up, but once you send a second message onward the bot behaves as intended.

I will be setting up another VM instance in a zone that has GPU available to see if this makes any difference, but I’m just guessing at this point. :pensive:

Hi @slcheungcasado , I’ve seen these types of errors too

E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)

But I haven’t yet seen it actually cause an error. Does the talk to your assistant page always ignore the first message, or does it work if you refresh the page before starting/give it a few seconds first?

I tried giving it about a minute or two, but that doesn’t seem to make a difference. It seems to work if I train a new model, set that model as active, and then I refresh the page prior to using talk to your assistant.

On another note, if I happen to stop the server and start it back up the chat UI behaves worse. The bot doesn’t reply back at all (but again if I refresh, the conversation recorded is correct). This behavior goes away if I train a new model set that as active, refresh and then chat with the newly trained bot.

1 Like

Thanks for the feedback. I’ve seen similar behavior but not sure what causes it. I’ll pass this on as feedback to the development team :+1:

Re. the chat hanging issue - this is resolved in rasa-x==0.27

1 Like

@mloubser Hi , it seems i got the issue too , the chat UI hangs and the chatbot doesn’t reply ,( rasa x 0.27.7):

rasa-production_1 | /opt/venv/lib/python3.6/site-packages/rasa/utils/common.py:351: UserWarning: Interpreter parsed an intent 'hey' which is not defined in the domain. Please make sure all intents are listed in the domain.

rasa-production_1 | 2020-04-28 04:12:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

rasa-worker_1 | 2020-04-28 04:12:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

Also why does this show up always in the logs :

rasa-worker_1 | 2020-04-28 04:23:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

rasa-production_1 | 2020-04-28 04:23:29 ERROR rasa.core.agent - An exception was raised while fetching a model. Continuing anyways...

it seems like @slcheungcasado already stated , it does work when training a new model and refreshing the page , but still if i update the rasa x version again the chat will hang again

@pandaxar when updating rasa-x, you will usually have to re-train the model if your model is incompatible with the new version of rasa. It looks like that is what the error is pointing to as well.