Should I annotate all examples in my dataset?

I’m building a bot and try to extract entities for how many push-ups someone has done in a day.

If I have in my dataset:

  • monday 2
  • tuesday 2

Should I annotate ‘2’ in both examples?

Yes you should annotate all examples in your training data if you are going to use a machine learning based entity extractor (like DIET).

Alternatively, you could use the regex entity extractor to specify a pattern to look for. If you use this approach, per the docs you would only need to annotate two training examples:

you do need at least two annotated examples of the entity so that the NLU model can register it as an entity at training time.

1 Like