Troubleshooting deployed assistant's missing responses

I’ve just deployed a (very) minimal viable assistant (explorer-ai.chat) using Rasa Open-Source. It’s currently running as a simple web app from a network of Docker containers built on an E2 machine (Ubuntu 18.04) hosted by Google Cloud. Here’s my GitHub repo which gives some explanation about the bot’s purpose, along with all of the code.

If you engage in conversation with the bot, you’ll quickly note that the NLU needs a lot of work. Honestly, I think the model is very under-trained: I’m using little data, and I’ve incorporated zero stories (relying only on rules for dialog management), I only trained for 30 epochs, and I didn’t really do any hyperparameter tuning. I’ve got some ideas for how I can improve in that area. What I’d really like to understand first, though, is the issue that’s illustrated here:

The bot often fails to respond to a user utterance at some point in the conversation. Here are some items that might be relevant:

  • I’m using the socket.io endpoint behind an Nginx reverse proxy. They both seem to be working fine (based on the logs), but I think one of them might be involved since…
  • This issue is nonexistent when I interact with the bot locally using rasa shell, meaning that the bot never skips a beat. Caveat: the models are different (but trained on the same data, using the same pipeline); since the app was deployed using Docker, I had to train a new model when creating my backend server (Rasa found the existing models to be “invalid”), which has this Dockerfile:
  • The issue (nearly?) always occurs at the beginning of the conversation (as illustrated above) when the app is running localhost. The only significant difference between the Docker configurations is the fact that I’ve built my explorer-ai.chat backend server from the rasa/rasa:3.3.1 image, and I built my localhost backend server from the khalosa/rasa-aarch64:3.3.1 image (I have an M1 chip).

Here is a snippet of the logs corresponding to the conversation above:

It seems that the correct action is predicted and a BotUttered event was triggered. But there’s no action from the web server during this time. In fact, the next log entries from the web server are these, which occur before the user’s next message.

All of my code and configurations are here.

Any idea where this problem might be stemming from?

The issue was that I didn’t have the proxy configured correctly. The /socket.io/ location needed some additional directives. This ended up working:

location /socket.io/ {
    proxy_pass          http://rasa_server:5005;
    proxy_http_version  1.1;
    proxy_set_header    Upgrade $http_upgrade;
    proxy_set_header    Connection $connection_upgrade;
}

And, in order to avoid an unknown "connection_upgrade" variable error, I also needed to add the following directive to the http block my nginx.conf file:

map $http_upgrade $connection_upgrade {  
    default   upgrade;
    ''        close;
}

I think that, previously, the proxy connection was falling back to HTTP long polling (the default for socket.io).