How can RegexFeaturizer be utilized in the DIET system?

I am wondering how RegexFeaturizer can work in the vector-based DIET system.

When taking a look at the pipeline for nlu, we can take advantage of RegexFeaturizer, but I don’t know how it works. Are there any extra procedures to generalize the regex rules from training data and to vectorize them in the form of sparse features like one-hot encodings?

Your training data needs to have a lookup table if you want the RegexFeaturizer to have an effect. It looks something like:

nlu:
- lookup: country
  examples: |
    - Afghanistan
    - Albania
    - ...
    - Zambia
    - Zimbabwe

Once this is added, then the sparse features will be used by DIET.

1 Like