We have two separate models of Rasa NLU and Rasa Core. Both models live in kubernetes container next to each other, so connection must be fast. We have several pods of NLU and Core.
Rasa Core sends requests to Rasa NLU to /model/parse endpoint. It can arrive at the Rasa NLU in 10 seconds. Rasa NLU parses text into intents and sends the response back to Core, but sometimes it can take 3 seconds or so to arrive at the Rasa Core.
The question is, why can a request to and response from NLU be delayed?