When more than one Rasa server Pod is run on kubernetes we don't get intended results

Description of environment:- We are running rasa server and rasa action server on Kubernetes cluster. Each of the servers are running in their own container/Pod. These Pods are behind respective services. So in our test deployments we have 6 Pods as follows:-

  1. Two rasa-servers Pods
  2. Two rasa-action-servers Pods
  3. One rasa-service Pod that directs the request to rasa-server based on endpoints e.g. /webhooks
  4. Second rasa-action-service Pod directs the request to rasa-action-server based
  5. Rasa-server calls rasa-action-service via http://rasa-action-service:5055/webhook

Below is example action that we allow users to perform via rasa-server UI

  1. We allow users to fill feedback and create jira (make a call to jira API) form Registered function for ‘action_submit’.

Use case-1: When we run more than one rasa servers we don’t get the intended action such as making a call to jira API or wiki API. System ends up giving un-intended responses or behaviors

Use-case-2 When we reduce number of rasa servers to exact 1 we start getting the intended results which is system creates jira and returns the url.

What are the configuration requirement when we run rasa servers and rasa actions servers in multi fashion with many servers running on Kubernetes cluster.

I appreciate any inputs. Thanks Rahul srivastava

Hi @rssrivast, are you also running a pod for a Lock Store? From the docs:

[using a lock store] means multiple Rasa servers can be run in parallel as replicated services, and clients do not necessarily need to address the same node when sending messages for a given conversation ID.

I think the unexpected behavior could be due to this issue.

Thanks for details. We are not running pod for a lock store. So it means if we want to run multiple rasa servers and in order to get expected behavior then we will need to also have RedisLockStore i.e redis server running as another container and follow the details as mentioned in the Lock Store document? Do we have to use redis database–> db (default: 1 ): Can I just use in memory for redis and keep db value as “0”? Also what is significance of key_prefix? What value should I put for -->key_prefix:? Can it be any alphanumeric?

Hi Erohmensing, Thanks for details. Below is the feedback after trying the suggested steps. I need further input as I am not getting intended results when scaling rasa server to multiple instances on same Kubernetes cluster:-

Detail of the product:-Rasa Open Source with version: 2.0.2 As per the guidance RedisLockStore Lock Store, I have enabled redis Lock store. Below are the details of endpoints.yaml:- lock_store: ** type: “redis”** ** url: name-of-service** ** port: 6379** ** db: 1** Below log data shows that both the servers are connected to Inmemorytracker and Redis lock store:- 2021-03-31 00:43:52,607 [DEBUG] Connected to InMemoryTrackerStore. 2021-03-31 00:43:52,608 [DEBUG] Connected to lock store ‘RedisLockStore’.

Use case: Running two rasa servers and two action servers still does not work. Below are the logs when I don’t get intended result with action_submit:- 2021-03-31 00:47:14,420 [DEBUG] Action ‘action_submit’ ended with events ‘[BotUttered(‘Returning to the main menu.’, {“elements”: null, “quick_replies”: null, “buttons”: null, “attachment”: null, “image”: null, “custom”: null}, {}, 1617151634.4202178), BotUttered(‘Is there anything else I can help you with? You can type “change role” if you want to look for resources related to other roles, or type “change vertical” if you want to look for resources related to a particular vertical. You can also type anything that you want me to help with, including getting any resources you are looking for, leaving a feedback, or providing a testimonial.’, {“elements”: null, “quick_replies”: null, “buttons”: null, “attachment”: null, “image”: null, “custom”: null}, {“template_name”: “utter_more_question”, “response”: “utter_more_question”}, xxxxxx), <rasa.shared.core.events.SlotSet object at xxxxxxxxxx>]’.

But when I reduce the number of rasa server to one instance I get correct result which has below data taken from debug log:-

2021-03-31 01:31:11,835 [DEBUG] There is a rule for the next action ‘action_submit’. 2021-03-31 01:31:11,835 [DEBUG] Predicted next action using policy_2_RulePolicy 2021-03-31 01:31:11,836 [DEBUG] Predicted next action ‘action_submit’ with confidence 1.00. 021-03-31 01:31:11,836 [DEBUG] Calling action endpoint to run action ‘action_submit’. 2021-03-31 01:31:15,963 [DEBUG] Action ‘action_submit’ ended with events '[BotUttered(‘Thanks for your feedback. The COE website engineers have been notified of the feedback on Slack. A Jira ticket has also been created at https://jira.com/browse/ Please go to the Jira ticket to review it and provide more details if needed.’, {“elements”: null, “quick_replies”: null, “buttons”: null, “attachment”: null, “image”: null, “custom”: null}, {}, xxxxxxxx), BotUttered(‘Is there anything else I can help you with? rasa.shared.core.events.SlotSet object at xxxxxxxx>]’.

Please advise? Thanks Rahul Srivastava

Ah, yeah, I would expect this also to happen if you’re using the InMemoryTrackerStore as each of the servers then are storing their own data inside the server containers themselves, and not sharing data. That means if one server gets the first message, and one gets the second, it will consider the second message as if it were the first message it received. I’d recommend having a separate database container for your tracker store (you can similarly define the connection parameters in the endpoins.yml of each server)

Thanks Erohmensing. So In my current implementation where I am getting inconsistent results I have only one separate container for Redis instance with below details:-

 lock_store:
         type: “redis
         url:     name-of-service
         port: 6379
         db: 1

Below log data shows that both the servers are connected to Inmemorytracker and Redis lock store:- 2021-03-31 00:43:52,607 [DEBUG] Connected to InMemoryTrackerStore. 2021-03-31 00:43:52,608 [DEBUG] Connected to lock store ‘RedisLockStore’.

Questions:-

  1. When you say I’d recommend having a separate database container for your tracker store, do I need to have Second instance/container of Redis Lock Store running separately with different url?
  2. When you say (you can similarly define the connection parameters in the endpoins.yml of each server), Can you please give me example or documentation how to define two parameters in **endpoins.yml**?
  3. How I can force each rasa server to connect to only one and different Redis Lock Store? Which parameter will let me enforce this?

I appreciate your inputs. Thanks Rahul

The Lock Store and the Tracker store serve different purposes which you can read about in the docs. If you want, you can use the Redis container as your tracker store as well by connecting to the RedisTrackerStore in a different database than your lock store, or you can use one of the others as you see fit.

How I can force each rasa server to connect to only one and different Redis Lock Store? Which parameter will let me enforce this?

I’m not sure what you mean by this. All of the Rasa servers should connect to the same lock store; otherwise it wouldnt be able to serve its purpose of keeping all of the servers in sync.

Hi Erohmensing, After making suggested changes below, my environment with multiple rasa servers instances on kubernetes started working.

Below were the changes I made:-

  • Created one redis container for Lock Store
  • Created second redis container for tracker store
  • Updated endpoints yaml with details on lock and tracker store.
  • Started rasa server replicas that was using the updated endpoints yaml Now I am getting expected results and they are fast.

Thanks a lot for help Rahul Srivastava

1 Like

I’m happy I could help!

@erohmensing Hi, I deployed the same setup on docker swarm having postgres as a tracker store instead of redis and redis as lock store. the system works as expected but after some time and specially if it is left inactive for a while (30 mins for example) it responds as if the is no conversation tracking. it responds as expected again after that if we repeat the use case. is there any explanation for that. I doubt it might be the postgres or do I miss something. your quick reply is highly appreciated. Thanks

1 Like