I’m using Spacy FR and
DucklingHTTPExtractor to extract dates & interval dimensions.
It work really well except for a corner case.
Combien d’utilisateurs ont été créés depuis hier ?
How many users have been created since yesterday
DucklingHTTPExtractor will extract hier (yesterday) as a date,
but also été (summer) as a date.
été means indeed summer in French, but ont été is simply have been
Is there a way to prevent this ?
Here is my current pipeline configuration:
language: "fr" # your two-letter language code pipeline: - name: SpacyNLP - name: SpacyTokenizer - name: SpacyFeaturizer - name: RegexFeaturizer - name: LexicalSyntacticFeaturizer - name: CountVectorsFeaturizer - name: CountVectorsFeaturizer analyzer: "char_wb" min_ngram: 1 max_ngram: 4 - name: DIETClassifier epochs: 100 - name: SpacyEntityExtractor - name: DucklingHTTPExtractor url: http://localhost:8000 locale: "fr_FR" # if not set the default timezone of Duckling is going to be used # needed to calculate dates from relative expressions like "tomorrow" timezone: "Europe/Paris" # Timeout for receiving response from http url of the running duckling server # if not set the default timeout of duckling http url is set to 3 seconds. timeout : 3 - name: EntitySynonymMapper - name: ResponseSelector epochs: 100 # Configuration for Rasa Core. # https://rasa.com/docs/rasa/core/policies/ policies: - name: MemoizationPolicy - name: TEDPolicy max_history: 5 epochs: 100 - name: MappingPolicy