Hi! I currently have a rasa nlu model with several intent including weather, jokes, and hello.
The weather intent needs to access a city lookup table with ~100000 values. My questions are:
How much of an impact will this have in terms of performance when extracting a city entity from a weather intent?
Will this impact the performance of determining the intents for jokes and hello?
Personally, everything felt a teensy bit slower for me (~0.5s to 1s), but it may be a trick of the eye as I’m also calling APIs. Any insight on this is appreciated!
With 100k elements you may experience a slightly longer evaluation time. The biggest impact will be in training your model. You’ll likely notice your model is taking longer to train with 100k elements in your lookup table.
figured I’d mention it here, in case you’re using the lookup tables for entity extraction. There’s a really neat trick that we can use if you’re using the lookup tables for just string matching: we can use a datastructure to keep the compute costs at bay.
My colleague @fkoerner implemented a FlashTextEntityExtractor over at our rasa-nlu-examples repository. She even made a repo with a benchmark to demonstrate the speedup. I’m currently working on an algorithm whiteboard episode that will explain how it works, but if you notice a performance hit this tool will keep the compute cost at bay.
Hi, an update on the FlashTextEntityExtractor, just wanted to say that it works like a huge charm and the replies I get are almost instantaneous now!
I did however had a small bump trying to get it installed. It seems that I had to manually install flashtext using pip install flashtext, otherwise i would get a module not found error.
There also seems to be a teensy typo in the FlashTextEntityExtractor docs, as the correct key seems to be “non_word_boundaries” instead of “non_word_boundary”.
Anyhow, thank you again! It works so smoothly and beautifully with my huge lookup table!
How does Rasa use lookup tables in training? All of the elements of the table go through a RegexFeaturizer and weights are trained on those features? Do the elements of the lookup table also automatically get added as examples for the entity (as if we were to add the lookup table to our training examples as well)?
This depends on which components you’re using! But yes, for the Featurizer that’s the case. The elements of the lookup table are extracted as entities by the RegexEntityExtractor. I guess in a way they are added as examples, but they will only be visible to the RegexEntityExtractor