Thanks for all the ideas. Much appreciated! Let me comment on each:
you could implement your own custom slot type (see: Domain)
This looks like a cool feature, but also too cool (aka overkill) for what I’m trying to achieve. All I really need is that Rasa matches something from a fixed set of possibilities, which is why the lookup table approach without custom code, i.e. only via yaml files, seems more attractive: it’s simpler to understand/maintain and should do the trick.
Alternatively, you can break your domain file down into several:
I suppose you meant to break down my “nlu” file? That’s where (I believe) one would annotate tokens with entity types. The size of the file is a concern, but I am actually more concerned with having to make up (or repeat) intent examples, just to squeeze in a new value for an entity type.
Most likely using a pertained language model for entity extraction will help pick these small nuances up.
I am building a system for a real client, who is not very keen on error margins =) Which is why (again) a lookup table approach is more desirable, because it guarantees matching. A LM would be an interesting approach I’d think for wrong spellings/transcriptions, but maybe Fuzzy Matching takes care of that? In any case, wrong spellings/transcriptions is a problem for v 2.0 =)
Another thing you could do to avoid having examples for every single thing sprinkled across your training data is to use synonyms
Ah, better not =) I do need synonyms but in the true sense of the word, not as a workaround. My list of entities will have a, b, c… which in turn should map to their respective synonyms, a1, a2, a3, b1, b2, b3…
That all being said, I finally got it to work with a small modification to my initial setup. Will post the solution below.