I am currently building an NLU system with rasa and I noticed that the inference is very slow. This definitely is partially due to a heavy model, but its also not beneficial to perform inference sequenitally.
This is not only a problem when using rasa nlu only, but also when performing cross validation during chatbot development. I do inference with the following code snippet. Training here is significantly faster than inference (probably due to the sequential approach).
interpreter = rasa.nlu.model.Interpreter.load(model_path) for index, instance in tqdm(data.items()): pred = interpreter.parse(instance["text"])
Is there a way to parallelize this? The components obviously are capable of using batches, since this is done during training.
Thanks in advance.