Duckling mistaking id for datetime

Polaris000 · September 4, 2021, 11:33am

In my bot, I’m using CRF for entity extraction along with DIET for intent classification. I’m also using Duckling to extract time-based entities. The issue arises when there’s an entity with MongoDB object ID. What’s happening is that though CRF is correctly extracting the entity, Duckling is sometimes mistaking it to be a datetime.

I tried adding a prefix to the entity and added regex for it. The user message would look like this:

some_text MONGOID//612a1731af13ee4e235e5ead

Duckling sometimes sees this as a datetime. It extracts the “1731” as year, etc. Specifically, this is the result:

{
	'start': 99,
	'end': 107,
	'text': '612a1731',
	'value': '1731-01-01T06:12:00.000-07:53',
	'confidence': 1.0,
	'additional_info': {
		'values': [{
			'value': '1731-01-01T06:12:00.000-07:53',
			'grain': 'minute',
			'type': 'value'
		}, {
			'value': '1731-01-02T06:12:00.000-07:53',
			'grain': 'minute',
			'type': 'value'
		}, {
			'value': '1731-01-03T06:12:00.000-07:53',
			'grain': 'minute',
			'type': 'value'
		}],
		'value': '1731-01-01T06:12:00.000-07:53',
		'grain': 'minute',
		'type': 'value'
	},
	'entity': 'time',
	'extractor': 'DucklingEntityExtractor'
}

My pipeline is this:

pipeline:
   - name: WhitespaceTokenizer
   - name: RegexFeaturizer
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: CRFEntityExtractor
     "BILOU_flag": True
     "features": [["low", "title", "upper"],[
      "bias",
      "low",
      "prefix5",
      "prefix2",
      "suffix5",
      "suffix3",
      "suffix2",
      "upper",
      "title",
      "digit",
      "pattern",],["low", "title", "upper"]]
     "max_iterations": 50
     "featurizers": []
   - name: DucklingEntityExtractor
     url: "http://localhost:8000"
     locale: "en_US"
     dimensions: ["time", "duration", "ordinal"]
   - name: DIETClassifier
     epochs: 500
     entity_recognition: False
     constrain_similarities: true
   - name: EntitySynonymMapper
   - name: FallbackClassifier
     threshold: 0.7
     ambiguity_threshold: 0.1

Since there’s no issue in my trainable components like CRF or DIET, and I can’t exactly train Duckling, any suggestions for what I can do?

Thanks in advance.

Gehova · September 6, 2021, 10:07pm

If duckling mistaking the entity is the only problem you have, try to add “number” to the dimensions of duckling in the pipeline. This way “1731” will be extracted as a number, not a datetime.

Polaris000 · September 7, 2021, 9:43am

Thanks for responding @Gehova. That would be one solution, but is there some way to prevent Duckling from extracting anything here in the first place? The mongoid above is neither a datetime nor a numeric value.

Can I maybe build a custom component that would pre-extract such entities before Duckling is able to extract an incorrect value?

Gehova · September 7, 2021, 3:49pm

If the CRT can detect the mongoid correctly you can customize the Duckling component to eliminate the mongoid from the text before sending it to the Duckling server.

Polaris000 · September 8, 2021, 4:59am

That could work. Could you tell me how I can go about doing this?

Polaris000 · September 18, 2021, 11:19am

Hey @Gehova . Could you get back to me on this? Thanks.

Topic		Replies	Views
How to use Duckling with CRFEntity Rasa Open Source	3	852	August 15, 2019
How to extract date, time & duration using CRF Entity extractor Rasa Open Source	13	3473	April 3, 2021
Issue in extracting Date/Time entities Rasa Open Source	2	1519	December 21, 2018
Pattern extraction problem with DucklingEntityExtractor Rasa Open Source	6	1102	February 24, 2023
ValueError _raise_on_same_start_and_different_end_positions Rasa Open Source	0	372	August 23, 2022

Duckling mistaking id for datetime

Related topics