Prevent a group of word to be extracted as an entity

Hi !

I’m using Spacy FR and DucklingHTTPExtractor to extract dates & interval dimensions. It work really well except for a corner case.

Combien d’utilisateurs ont été créés depuis hier ?


How many users have been created since yesterday

DucklingHTTPExtractor will extract hier (yesterday) as a date, but also été (summer) as a date.

été means indeed summer in French, but ont été is simply have been

Is there a way to prevent this ?

Here is my current pipeline configuration:

language: "fr"  # your two-letter language code

  - name: SpacyNLP
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: SpacyEntityExtractor  
  - name: DucklingHTTPExtractor
    url: http://localhost:8000
    locale: "fr_FR"
    # if not set the default timezone of Duckling is going to be used
    # needed to calculate dates from relative expressions like "tomorrow"
    timezone: "Europe/Paris"
    # Timeout for receiving response from http url of the running duckling server
    # if not set the default timeout of duckling http url is set to 3 seconds.
    timeout : 3        
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100

# Configuration for Rasa Core.
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: MappingPolicy