EDIT: Redis seems unsuitable for that usecase.
I don’t have experience with such scenarios. But I guess you could deploy multiple rasa instances that have all the same model files (maybe from an aws bucket, or just plain copies) connect them to a high available tracker store (mongo cluster) and put these rasa instances behind a load balancer.
According to the mongodb faq:
MongoDB is consistent by default: reads and writes are issued to the primary member of a replica set. Applications can optionally read from secondary replicas, where data is eventually consistent by default. Reads from secondaries can be useful in scenarios where it is acceptable for data to be slightly out of date, such as some reporting applications. Applications can also read from the closest copy of the data (as measured by ping distance) when latency is more important than consistency.
This means rasa will always use the newest conversation state to predict the next action. Redis does not guarantee this which might lead to a corrupted tracker state or wrong predictions.