Using rasa as python library I am loading two agents (with different models, with around 60 intents), Both models are in memory.
If i hit single request it takes between 2-4 seconds, now from JMeter if i hit 100 requests in parallel with 10 sec ramp-up period then it response time starts from 2 sec to 120sec and the average is 60 sec.
What can be the general solutions that can be applied to reduce the response time ? any suggestions while training model? even for the single request it is taking 2-4 seconds.