Will ~100k lookup values significantly slow down an entire model?

Hi! I currently have a rasa nlu model with several intent including weather, jokes, and hello.

The weather intent needs to access a city lookup table with ~100000 values. My questions are:

  • How much of an impact will this have in terms of performance when extracting a city entity from a weather intent?
  • Will this impact the performance of determining the intents for jokes and hello?

Personally, everything felt a teensy bit slower for me (~0.5s to 1s), but it may be a trick of the eye as I’m also calling APIs. Any insight on this is appreciated!

Hi @Talos0248, when you mention performance do you mean time to train a model or model responsiveness to incoming messages?

I meant responsiveness to incoming messages, but now that you mention it, does it affect the training time a lot too? :thinking:

In general as your lookup tables get larger they’re going to have a larger impact on training and evaluation time.

This blog post is a little older, but it does provide some benchmarking numbers around performance for different size lookup tables: Entity extraction with the new lookup table feature in Rasa NLU

With 100k elements you may experience a slightly longer evaluation time. The biggest impact will be in training your model. You’ll likely notice your model is taking longer to train with 100k elements in your lookup table.

1 Like

Hi @Talos0248,

figured I’d mention it here, in case you’re using the lookup tables for entity extraction. There’s a really neat trick that we can use if you’re using the lookup tables for just string matching: we can use a datastructure to keep the compute costs at bay.

My colleague @fkoerner implemented a FlashTextEntityExtractor over at our rasa-nlu-examples repository. She even made a repo with a benchmark to demonstrate the speedup. I’m currently working on an algorithm whiteboard episode that will explain how it works, but if you notice a performance hit this tool will keep the compute cost at bay.

2 Likes

Thank you for letting me know!! I’m glad that I wasn’t hallucinating when I felt things were a bit slower!

Oh, that looks like just what I need, thank you so much!!!

1 Like

Nice.

Feel free to let us know if you have any feedback!

1 Like

Hi, an update on the FlashTextEntityExtractor, just wanted to say that it works like a huge charm and the replies I get are almost instantaneous now!

I did however had a small bump trying to get it installed. It seems that I had to manually install flashtext using pip install flashtext, otherwise i would get a module not found error.

There also seems to be a teensy typo in the FlashTextEntityExtractor docs, as the correct key seems to be “non_word_boundaries” instead of “non_word_boundary”. image

Anyhow, thank you again! It works so smoothly and beautifully with my huge lookup table!

Thanks for letting me know!

I’ve added an issue on GitHub. I don’t know when I’ll be able to have a look, but I’m keeping track of it.

1 Like