Will ~100k lookup values significantly slow down an entire model?

Hi! I currently have a rasa nlu model with several intent including weather, jokes, and hello.

The weather intent needs to access a city lookup table with ~100000 values. My questions are:

  • How much of an impact will this have in terms of performance when extracting a city entity from a weather intent?
  • Will this impact the performance of determining the intents for jokes and hello?

Personally, everything felt a teensy bit slower for me (~0.5s to 1s), but it may be a trick of the eye as I’m also calling APIs. Any insight on this is appreciated!

Hi @Talos0248, when you mention performance do you mean time to train a model or model responsiveness to incoming messages?

I meant responsiveness to incoming messages, but now that you mention it, does it affect the training time a lot too? :thinking:

In general as your lookup tables get larger they’re going to have a larger impact on training and evaluation time.

This blog post is a little older, but it does provide some benchmarking numbers around performance for different size lookup tables: Entity extraction with the new lookup table feature in Rasa NLU

With 100k elements you may experience a slightly longer evaluation time. The biggest impact will be in training your model. You’ll likely notice your model is taking longer to train with 100k elements in your lookup table.

2 Likes

Hi @Talos0248,

figured I’d mention it here, in case you’re using the lookup tables for entity extraction. There’s a really neat trick that we can use if you’re using the lookup tables for just string matching: we can use a datastructure to keep the compute costs at bay.

My colleague @fkoerner implemented a FlashTextEntityExtractor over at our rasa-nlu-examples repository. She even made a repo with a benchmark to demonstrate the speedup. I’m currently working on an algorithm whiteboard episode that will explain how it works, but if you notice a performance hit this tool will keep the compute cost at bay.

2 Likes

Thank you for letting me know!! I’m glad that I wasn’t hallucinating when I felt things were a bit slower!

Oh, that looks like just what I need, thank you so much!!!

1 Like

Nice.

Feel free to let us know if you have any feedback!

1 Like

Hi, an update on the FlashTextEntityExtractor, just wanted to say that it works like a huge charm and the replies I get are almost instantaneous now!

I did however had a small bump trying to get it installed. It seems that I had to manually install flashtext using pip install flashtext, otherwise i would get a module not found error.

There also seems to be a teensy typo in the FlashTextEntityExtractor docs, as the correct key seems to be “non_word_boundaries” instead of “non_word_boundary”. image

Anyhow, thank you again! It works so smoothly and beautifully with my huge lookup table!

Thanks for letting me know!

I’ve added an issue on GitHub. I don’t know when I’ll be able to have a look, but I’m keeping track of it.

1 Like

How does Rasa use lookup tables in training? All of the elements of the table go through a RegexFeaturizer and weights are trained on those features? Do the elements of the lookup table also automatically get added as examples for the entity (as if we were to add the lookup table to our training examples as well)?

This depends on which components you’re using! But yes, for the Featurizer that’s the case. The elements of the lookup table are extracted as entities by the RegexEntityExtractor. I guess in a way they are added as examples, but they will only be visible to the RegexEntityExtractor

1 Like