How to extract multiple dates or names in single form

Hi!

I have a use case, where I need to extract multiple dates and names (e.g. date of appointment, date of birth, name of the patient, name of the doctor etc.) in a single form. The extractions are made by duckling (dates) and spaCy (names). What I don’t get is how to extract multiple entities of specified type in one form.

It is not advised, to add dates and names to NLU and tag it (when using spacy and duckling), because it might cause a conflict, but… I don’t really see another way around. What I’ve done for now was adding names with different roles (‘client’, ‘doctor’), which seems to works, but… It’s not really great, because now I need to build a huge NLU containing names and spacy is practically useless (it basically works as a validator for DIETClassifier). As for dates… I’m kind of clueless how to do it properly. Adding dates with roles to NLU seems completely dumb (altough… I don’t really like adding names with roles to NLU as well).

My config:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: pl

pipeline:
  - name: WhitespaceTokenizer
    intent_tokenization_flag: True
    intent_split_symbol: "+"
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "word"
  - name: DucklingEntityExtractor
    url: http://localhost:8000
  - name: DIETClassifier
    epochs: 100
    evaluate_on_number_of_examples: 0
    evaluate_every_number_of_epochs: 5
    tensorboard_log_directory: ".tensorboard"
    tensorboard_log_level: "epoch"
  - name: SpacyNLP
    model: "pl_core_news_md"
  - name: SpacyEntityExtractor
    dimensions: ["persName", "placeName"]
  # - name: rasa_nlu_components.extractors.EntityLemmaMapper
  - name: EntitySynonymMapper

  # Other components
  - name: ResponseSelector
    epochs: 100
    retrieval_intent: faq
  - name: ResponseSelector
    epochs: 100
    retrieval_intent: chitchat
policies:
   - name: MemoizationPolicy
   - name: TEDPolicy
     max_history: 10
     epochs: 150
     evaluate_on_number_of_examples: 0
     evaluate_every_number_of_epochs: 5
     tensorboard_log_directory: "./tensorboard"
     tensorboard_log_level: "epoch"
   - name: RulePolicy

Any advise would be great.

If you are extracting information strictly through a form where you expect the user to give information for each question one by one, then you can use from_text mapping for slots and fill them when the form is active. Additionally you can write validating functions which validate the form input. Otherwise, if you expect the information to be given out in one sentence like -

My name is John and I am looking for an appointment with Dr. Carl

In this case, using roles is the best choice to disambiguate between different names. We are working on making the role classification less data hungry and you can expect it to follow in some of the future releases in the near term.