Will ~100k lookup values significantly slow down an entire model?

Talos0248 · April 24, 2021, 4:51am

Hi! I currently have a rasa nlu model with several intent including weather, jokes, and hello.

The weather intent needs to access a city lookup table with ~100000 values. My questions are:

How much of an impact will this have in terms of performance when extracting a city entity from a weather intent?
Will this impact the performance of determining the intents for jokes and hello?

Personally, everything felt a teensy bit slower for me (~0.5s to 1s), but it may be a trick of the eye as I’m also calling APIs. Any insight on this is appreciated!

m.vielkind · April 26, 2021, 7:46pm

Hi @Talos0248, when you mention performance do you mean time to train a model or model responsiveness to incoming messages?

Talos0248 · April 27, 2021, 6:50am

I meant responsiveness to incoming messages, but now that you mention it, does it affect the training time a lot too?

m.vielkind · April 27, 2021, 2:24pm

In general as your lookup tables get larger they’re going to have a larger impact on training and evaluation time.

This blog post is a little older, but it does provide some benchmarking numbers around performance for different size lookup tables: Entity extraction with the new lookup table feature in Rasa NLU

With 100k elements you may experience a slightly longer evaluation time. The biggest impact will be in training your model. You’ll likely notice your model is taking longer to train with 100k elements in your lookup table.

koaning · April 28, 2021, 10:04am

Hi @Talos0248,

figured I’d mention it here, in case you’re using the lookup tables for entity extraction. There’s a really neat trick that we can use if you’re using the lookup tables for just string matching: we can use a datastructure to keep the compute costs at bay.

My colleague @fkoerner implemented a FlashTextEntityExtractor over at our rasa-nlu-examples repository. She even made a repo with a benchmark to demonstrate the speedup. I’m currently working on an algorithm whiteboard episode that will explain how it works, but if you notice a performance hit this tool will keep the compute cost at bay.

Talos0248 · April 28, 2021, 3:59pm

Thank you for letting me know!! I’m glad that I wasn’t hallucinating when I felt things were a bit slower!

Talos0248 · April 28, 2021, 4:00pm

Oh, that looks like just what I need, thank you so much!!!

koaning · April 29, 2021, 11:33am

Nice.

Feel free to let us know if you have any feedback!

Talos0248 · May 7, 2021, 4:39am

Hi, an update on the FlashTextEntityExtractor, just wanted to say that it works like a huge charm and the replies I get are almost instantaneous now!

I did however had a small bump trying to get it installed. It seems that I had to manually install flashtext using pip install flashtext, otherwise i would get a module not found error.

There also seems to be a teensy typo in the FlashTextEntityExtractor docs, as the correct key seems to be “non_word_boundaries” instead of “non_word_boundary”.

Anyhow, thank you again! It works so smoothly and beautifully with my huge lookup table!

koaning · May 7, 2021, 8:08am

Thanks for letting me know!

I’ve added an issue on GitHub. I don’t know when I’ll be able to have a look, but I’m keeping track of it.

bradyneal · September 24, 2021, 7:15pm

How does Rasa use lookup tables in training? All of the elements of the table go through a RegexFeaturizer and weights are trained on those features? Do the elements of the lookup table also automatically get added as examples for the entity (as if we were to add the lookup table to our training examples as well)?

fkoerner · September 27, 2021, 12:22pm

This depends on which components you’re using! But yes, for the Featurizer that’s the case. The elements of the lookup table are extracted as entities by the RegexEntityExtractor. I guess in a way they are added as examples, but they will only be visible to the RegexEntityExtractor

Topic		Replies	Views
Question on lookup tables Rasa Open Source	5	297	November 26, 2021
How does the lookup table in rasa_nlu work? Is there something similar to keyword_intent_classifier for entity extractors? Rasa Open Source	6	5398	August 13, 2021
An experiment with lookup table Rasa Open Source	2	2113	May 25, 2022
Lookup tables and entity training Rasa Open Source	3	5765	November 19, 2019
New to RASA 2 - entity extraction for large lists Rasa Open Source	11	1183	June 5, 2021

Will ~100k lookup values significantly slow down an entire model?

Related topics