I trained a rasa model and I’ve been testing using Rasa -Shell. However, I’ve recorded that it takes about 2.5 seconds from when I send a message to when I get a response with gpu or around 2.8s with CPU. This is a bit concerning to me considering I have top notch hardware (I9 10900k cpu 3090 GPU) and my model is currently very simple. Is there something I’m doing wrong or some way that I can speed up the inference time of the model? I noticed in the rasa playground, the responses are near instant.
Training models is generally what takes times. Once its done, the prediction/serving is generally instant. The hardware/gpu has much less impact there.
Seems like something else is going on. not training … is there some loop somewhere or something else thats going on?
I don’t think so. For testing purposes I created an entirely new project using rasa init and did it outside of my current python virtual environment directly on my machine to make sure it could use machine resources properly. Even without modifying the init project at all, it still takes 2+ seconds to get back a response. Is there perhaps a big difference in inference time when using rasa shell vs running rasa as a server? I’m glad to hear that it should be faster so at least I know that rasa should work for my application if I can figure out what’s going on.
I’ve done a bunch of testing and still haven’t been able to fix the problem but heres what I’ve learned: For some reason its specific to my pc (but it cant be a hardware issue). I tried doing a windows restart, uninstalling and reinstalling python, trying in both command prompt and powershell, and making a new rasa project and all of these still result in a 2.5s delay for response. I tried using --debug with rasa shell and I noticed that in the 2.5s time window I’m seeing there are no debug messages whatsoever and then all of the debug messages get loaded in at once after the 2.5s. I installed rasa on my macbook and the responses are instant. Have any ideas of what could be wrong or what I could try?
I put a ton of debug statements in the code and I see the debug message here (debug 22 in console.py record_messages):
bot_responses_stream = _send_message_receive_stream(
server_url, auth_token, sender_id, text, request_timeout=request_timeout
)
previous_response = None
logger.debug("debug 22")
async for response in bot_responses_stream:
logger.debug("tyler debug 24")
if previous_response is not None:
_print_bot_output(previous_response)
previous_response = response
and then after about 2.5seconds, I see the next debug message here (debug 14 in channel.py register handler):
def register( input_channels: List[“InputChannel”], app: Sanic, route: Optional[Text] ) → None: “”“Registers input channel blueprints with Sanic.”""
async def handler(message: UserMessage) -> None:
logger.debug("debug 14")
await app.ctx.agent.handle_message(message)
use free -m or top to see if your server have enough RAM.
if there are little ram,rasa will response slowly.
It isnt a ram issue. I have 64gb of available RAM. Also, I noticed that it seems to be significantly faster when running it as a server rather than in rasa shell.
yeah.and If u train by sever,it also faster than train by local.
rasa shell is just used for testing.
a simple train using similar machine config cost about 8 sec.You may print the rasa model server’s log to see which one disturb the training speed