Hi !
I’m using Spacy FR and DucklingHTTPExtractor
to extract dates & interval dimensions.
It work really well except for a corner case.
Combien d’utilisateurs ont été créés depuis hier ?
meaning:
How many users have been created since yesterday
DucklingHTTPExtractor
will extract hier (yesterday) as a date,
but also été (summer) as a date.
été means indeed summer in French, but ont été is simply have been
Is there a way to prevent this ?
Here is my current pipeline configuration:
language: "fr" # your two-letter language code
pipeline:
- name: SpacyNLP
- name: SpacyTokenizer
- name: SpacyFeaturizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: SpacyEntityExtractor
- name: DucklingHTTPExtractor
url: http://localhost:8000
locale: "fr_FR"
# if not set the default timezone of Duckling is going to be used
# needed to calculate dates from relative expressions like "tomorrow"
timezone: "Europe/Paris"
# Timeout for receiving response from http url of the running duckling server
# if not set the default timeout of duckling http url is set to 3 seconds.
timeout : 3
- name: EntitySynonymMapper
- name: ResponseSelector
epochs: 100
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: TEDPolicy
max_history: 5
epochs: 100
- name: MappingPolicy