Openshift Kubernetes Multiple Pod Model Selection

Hi everyone, We modified RASA 2.3.3 and use it for NLP. We are having difficulties about our response time and TPS with one instance. We had created second pod on openshift. It looks helped a little but not much. So we tried to use redis as Lockstore and worker amount to 10 and see what happens. Actually nothing changed. Because we were using just NLU not NLG. So i understand that if we have problem with NLU TPS we must use more POD on kubernetes. I will try some configurations to get optimum CPU, RAM, and POD number. If you have any insgiht it is wellcome. On the other hand if i increase POD number there exists a problem. We are using one POD for training puprose and one for prediction purpose (production pod lets say) Whenever we make training we can select the model and upload it to S3 and go to production pod and download it and activate it on pod by using our frontend app. But when i send http request to RASA kubernetes direct it to any of the pods randomly and we can download model to that pod only and can activate it on that pod only. To download same model to every pod and activate it we have to try it many times to find missing pods. I have searched the topic over internet and there exists some opportunuties. But i have doubts about them. So what is the benchmark for this operation can anyone have any idea?

When each Rasa OSS production pod starts, you can configure it to use a model server. When you have a new model, you put it on the model server and each of your production OSS pods will automatically pull the new model when it shows up.

Thank you. We will check it.