Is it possible to define entity values upfront, instead of learning them from examples?

rodrigo_deoliveira · June 10, 2021, 6:02pm

I know you can annotate words in intent examples with an entity, e.g.:

intent: add.drink
- examples: |
  - can I get a [small](drink.size) [latte](drink.type)

But adding (at least) 1 example for every value of every entity in my database would make the NLU file huge!

Is it possible to define entity values elsewhere, thereby making the NLU file (c)leaner?

Things I have tried but to no avail:

Lookup tables:

- lookup: drink.size
  examples: |
    - small
    - medium
    - large

Categorical slots:

slots:
  drink.size:
    type: categorical
    influence_conversation: true
    values:
    - small
    - medium
    - large

My pipeline:

pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: RegexEntityExtractor
    case_sensitive: false
    use_lookup_tables: true
    use_regexes: true
    use_word_boundaries: true

snek · June 10, 2021, 8:37pm

How did you implement look-up tables and categorial slots?

rodrigo_deoliveira · June 11, 2021, 10:17am

As above really. Lookup tables in data/nlu.yml and categorical slots in domain.yml.

snek · June 15, 2021, 6:20am

If your goal is to make the NLU files smaller/cleaner, you could implement your own custom slot type (see: https://rasa.com/docs/rasa/domain#custom-slot-types) which you would write in python and which would enable you to upload a lookup table as a text file instead of dumping the whole thing in your domain file.

Alternatively, you can break your domain file down into several:

The domain can be defined as a single YAML file or split across multiple files in a directory. (See Domain)

Finally, you do not really need to set at least one example for every value. Most likely using a pertained language model for entity extraction will help pick these small nuances up. Another thing you could do to avoid having examples for every single thing sprinkled across your training data is to use synonyms (see here: https://rasa.com/docs/rasa/nlu-training-data#synonyms).

rodrigo_deoliveira · June 15, 2021, 10:16am

Thanks for all the ideas. Much appreciated! Let me comment on each:

you could implement your own custom slot type (see: Domain)

This looks like a cool feature, but also too cool (aka overkill) for what I’m trying to achieve. All I really need is that Rasa matches something from a fixed set of possibilities, which is why the lookup table approach without custom code, i.e. only via yaml files, seems more attractive: it’s simpler to understand/maintain and should do the trick.

Alternatively, you can break your domain file down into several:

I suppose you meant to break down my “nlu” file? That’s where (I believe) one would annotate tokens with entity types. The size of the file is a concern, but I am actually more concerned with having to make up (or repeat) intent examples, just to squeeze in a new value for an entity type.

Most likely using a pertained language model for entity extraction will help pick these small nuances up.

I am building a system for a real client, who is not very keen on error margins =) Which is why (again) a lookup table approach is more desirable, because it guarantees matching. A LM would be an interesting approach I’d think for wrong spellings/transcriptions, but maybe Fuzzy Matching takes care of that? In any case, wrong spellings/transcriptions is a problem for v 2.0 =)

Another thing you could do to avoid having examples for every single thing sprinkled across your training data is to use synonyms

Ah, better not =) I do need synonyms but in the true sense of the word, not as a workaround. My list of entities will have a, b, c… which in turn should map to their respective synonyms, a1, a2, a3, b1, b2, b3…

That all being said, I finally got it to work with a small modification to my initial setup. Will post the solution below.

rodrigo_deoliveira · June 15, 2021, 10:23am

Got it work with almost the same setup as before:

Lookup tables in nlu.yml:

- lookup: drink.size
  examples: |
    - small
    - medium
    - large

No categorical slots in domain.yml; my slots are now what they should be: text, list, etc.

And removed the DIET classifier from the pipeline in config.yml:

pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: RegexEntityExtractor
  case_sensitive: false
  use_lookup_tables: true
  use_regexes: true
  use_word_boundaries: true

Apparently one can also extract the lookup table to a text file (as per this tutorial at 42:16), but I haven’t tried that yet.

Topic		Replies	Views
Lookup tables and entity training Rasa Open Source	3	5768	November 19, 2019
RASA NLU not capturing synonyms correctly, resulting in wrong slot value Rasa Open Source	6	2980	October 12, 2018
Many values for the same entity how to do it Rasa Open Source	6	354	April 11, 2020
How does the lookup table in rasa_nlu work? Is there something similar to keyword_intent_classifier for entity extractors? Rasa Open Source	6	5398	August 13, 2021
Can I Import Examples? Rasa Open Source	6	403	November 14, 2021

Is it possible to define entity values upfront, instead of learning them from examples?

Related topics