DucklingEntityExtractor not marking year as time entity

Sometimes the DucklingEntityExtractor doesn’t pick extract year as time entity.

Example 1: if I provide year before the month

? Your input -> magazine_name 2020 jan                   
? Is the intent 'download_magazine' correct for '[magazine_name](magazine) 2020 [jan]{"entity": "
time", "value": "2023-01-01T00:00:00.000+00:00"}' and are all entities labeled correctly? (Y
/n)

But if I provide month and year after that then it works.

? Your input -> magazine_name jan 2020                                                           
? Is the intent 'download_magazine' correct for '[magazine_name](magazine) [jan 2020]{"entity": "
time", "value": "2020-01-01T00:00:00.000+00:00"}' and are all entities labeled correctly? (Y
/n)    

Example 2: If only a year is provided. 2020 should have been marked as an year.

? Your input -> magazine_name 2020                                                               
? Is the intent 'download_magazine' correct for '[magazine_name](magazine) 2020' and are all enti
ties labeled correctly? (Y/n)   

Can anybody please suggest what changes can I do to get the entities marked as time in the above examples

The pipeline configuration is as follows:

pipeline:
  # # No configuration for the NLU pipeline was provided. The following default pipeline was used to train your model.
  # # If you'd like to customize it, uncomment and adjust the pipeline.
  # # See https://rasa.com/docs/rasa/tuning-your-model for more information.
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1
  - name: "DucklingEntityExtractor"
    url: "http://localhost:8000"
    dimensions: ["time"]
    timezone: "Asia/kolkata"

The NLU data for the intent is as follows:

version: "3.0"

nlu:
  - intent: download_magazine
    examples: |
      - download [magazine_name](magazine) 2020 Jan
      - [magazine_name](magazine) February 2022
      - last month [magazine_name](magazine)
      - download [magazine_name](magazine) last december
      - download [magazine_name](magazine) March 2018
      - [magazine_name](magazine)

Best to post this on the duckling project.

Thanks for the reply @stephens

But Duckling identifies these dates perfectly. I have attached two images showing this. From - https://duckling.wit.ai/

image image

And when I try with DucklingEntityExtractor, the year 2020 is ignored, shown in below snippet.

? Your input -> magazine_name 2020 jan                   
? Is the intent 'download_magazine' correct for '[magazine_name](magazine) 2020 [jan]{"entity": "
time", "value": "2023-01-01T00:00:00.000+00:00"}' and are all entities labeled correctly? (Y
/n)

So what I understand is that the problem is not with Duckling but DucklingEntityExtractor

Your pipeline includes DIET. I would look at the --debug log to confirm that the entity extraction you’re seeing in interactive learning is from Duckling and not DIET. It’s possible that both entity extractors are extracting the date (although your nlu.yml leads me to believe that your example should be using Duckling.

Here’s an example of the debug log that shows what entity extractor is used for a user message:

2022-04-27 16:33:11 DEBUG    rasa.core.processor  - Received user message 'quote bitcoin' with intent '{'id': -5105866451839917756, 'name': 'single_quote', 'confidence': 0.9926217198371887}' and entities '[{'entity': 'crypto_symbol', 'start': 6, 'end': 13, 'confidence_entity': 0.9051501154899597, 'value': 'BTC', 'extractor': 'DIETClassifier', 'processors': ['EntitySynonymMapper']}]'

Received user message 'magazine_name 2020 jan' with intent '{'name': 'download_magazine', 'confidence': 1.0}' and entities '[{'entity': 'magazine', 'start': 0, 'end': 8, 'confidence_entity': 0.9991121888160706, 'value': 'magazine_name', 'extractor': 'DIETClassifier', 'processors': ['EntitySynonymMapper']}, {'start': 14, 'end': 17, 'text': 'jan', 'value': '2023-01-01T00:00:00.000+00:00', 'confidence': 1.0, 'additional_info': {'values': [{'value': '2023-01-01T00:00:00.000+00:00', 'grain': 'month', 'type': 'value'}, {'value': '2024-01-01T00:00:00.000+00:00', 'grain': 'month', 'type': 'value'}, {'value': '2025-01-01T00:00:00.000+00:00', 'grain': 'month', 'type': 'value'}], 'value': '2023-01-01T00:00:00.000+00:00', 'grain': 'month', 'type': 'value'}, 'entity': 'time', 'extractor': 'DucklingEntityExtractor'}]'

None of the entity extractors are considering 2020. Duckling is picking up Jan for time and DIET is picking up magazine_name

? Is the intent 'download_magazine' correct for '[magazine_name](magazine) 2020 [jan]{"entity": "time", "value": "2023-01-01T00:00:00.000+00:00"}' and are all entities la
beled correctly? (Y/n)                                                                                                                                               

If I disable my whole pipeline and keep only duckling even then the year is not marked as entity

The NLU classification for 'magazine_name 2020 [jan]{"entity": "time", "value": "2023-01-01T00:00:00.000+00:00"}' returned 'None'
? What intent is it? (Use arrow keys)       

pipeline :

pipeline:

  - name: "DucklingEntityExtractor"
    url: "http://localhost:8000"
    dimensions: ["time"]
    timezone: "Asia/kolkata"

I’d look at the source code next. It looks like the duckling extractor isn’t being passed the entire utterance. I’ve never looked at the extractor before, maybe there’s a reason it doesn’t always process the entire utterance??

I checked the code, the whole text is being passed to duckling.

The problem is in the Duckling parsing.

When I made this request

curl --location --request POST 'http://0.0.0.0:8000/parse' \
--header 'Content-Type: application/x-www-form-urlencoded' \
--data-urlencode 'text=magazine_name 2020 jan ' \
--data-urlencode 'locale=null' \
--data-urlencode 'tz=Asia/kolkata' \
--data-urlencode 'dims=null' \
--data-urlencode 'reftime=1651122390000'

Response : 2020 is picked up as numeric dimension and not time

[
    {
        "body": "2020",
        "start": 14,
        "value": {
            "value": 2020,
            "type": "value"
        },
        "end": 18,
        "dim": "number",
        "latent": false
    },
    {
        "body": "jan",
        "start": 19,
        "value": {
            "values": [
                {
                    "value": "2023-01-01T00:00:00.000+00:00",
                    "grain": "month",
                    "type": "value"
                },
                {
                    "value": "2024-01-01T00:00:00.000+00:00",
                    "grain": "month",
                    "type": "value"
                },
                {
                    "value": "2025-01-01T00:00:00.000+00:00",
                    "grain": "month",
                    "type": "value"
                }
            ],
            "value": "2023-01-01T00:00:00.000+00:00",
            "grain": "month",
            "type": "value"
        },
        "end": 22,
        "dim": "time",
        "latent": false
    }
]

There is a intersection of dimensions which is shown on the website

But the parse endpoint doesn’t return it.

It would work just fine if I passed magazine_name jan 2020 as text.

TL;DR Rasa is picking up what duckling is parsing