Hi @lgabs, if you have the RegexFeaturizer in your pipeline and the assistant is training without any errors, the lookup tables are used. We don’t log if they are successfully used, we only log is something went wrong.
Regarding the usage: As pointed out in our documentation you need to include some examples in your training data that use an entry from the lookup table. The entries in the lookup tables are converted into features. Let’s take a look at an example:
“Hello my name is Tanja.”
Assume you have a lookup table for names and that one includes the word “Tanja”. What the RegexFeaturizer is then doing, it looks if it can find an entry from the lookup table in the text. If it finds a match, it sets a feature that basically says that “Tanja” is listed in a lookup table. The features are then passed on to the model, for example, DIETClassifier, and the model is then hopefully learning the correlation between “Tanja” is listed in the lookup table and “Tanja” being an entity of type name. If you don’t include any examples in the training data that contain an entry of the lookup table, the model is not able to learn this correlation as it will not be present. So it is important that you include some entires from your lookup table in the training data and mark them as entity.
Thanks for the reply and explanation! I’m including some examples in trainning data indeed, and I’m getting good results so far, except for names, which is very hard I was afraid of using the wrong pattern on mentioning them as above, and specially because rasa X reformats the pattern to match ##lookup:<entity_name>.txt or csv, depending on the file, and on rasa demo we have a product entity but in the nlu lookup mention is said products.txt.
Using the RegexFeaturizer and Diet gives really bad results for my Entities in lookup tables.
However, I have a lot of training sentences containing examples that are in the lookup tables (more than 200) the entities that are not specified in training sentences are mostly not recognized.
I tried to open a topic about that (here) but never had a relevant answer.
I can share with you 200 tests comparing RegexFeaturizer + CRFEntityExtractor with RegexFeaturizer + DIETClassifier
I just read through the thread you shared. Just wanted to make clear that DIETClassifier and CRFEntityExtractor do not influence each other. So if CRFEntityExtracotr comes first in the pipeline, the entities detected by that component are not used inside DIETClassifier.
As mentioned above the RegexFeaturizer converts the lookup tables into features that are then used by the components later in the pipeline. So if you are using the DIETClassifier without entity recognition, but you have a RegexFeaturizer in the pipeline and defined some lookup tables, those features will also be used for the intent classification in the DIETClassifier. However, in our experience those features are normally not that important for intent classification.
Regarding your problem around bad performance for the DIETClassifier on entities: It is hard to tell what might be going wrong without looking at the data and some evaluation metrics. How many entities do you have? What entities do you have? How many examples do you have per entity in your training data? Is it only the DIETClassifier that performs bad or also the CRFEntityExtractor?
when i train rasa nlu, i am getting c:\users\isha\anaconda3\envs\installingrasa\lib\site-packages\rasa\nlu\featurizers\sparse_featurizer\regex_featurizer.py:79: FutureWarning: Directly including lookup tables as a list is deprecated since Rasa 1.6. regex_pattern = self._generate_lookup_regex(table)
I have a lookup table which i have mentioned this way in nlu.md
Some examples from it i have included in the training.
Also CRFEntityExtractor and Dietclassifier both are there but entity_recognition: False for dietclassifier. So want CRFEntityExtractor to refer the lookup list. i also have the required RegexFeaturizer in my config.
Still lookup table is not being referred. examples which are present in lookup table but not in training are not being extracted.
My issue i was able to identify and resolve. May be this might help others who are facing difficulty with lookups. in nlu we shuldnt put a hyphen in front of the file path for lookup. regexfeaturizer then identifies that as a list. only mention the path without hypen like so: