Duckling mistaking id for datetime

In my bot, I’m using CRF for entity extraction along with DIET for intent classification. I’m also using Duckling to extract time-based entities. The issue arises when there’s an entity with MongoDB object ID. What’s happening is that though CRF is correctly extracting the entity, Duckling is sometimes mistaking it to be a datetime.

I tried adding a prefix to the entity and added regex for it. The user message would look like this:

some_text MONGOID//612a1731af13ee4e235e5ead

Duckling sometimes sees this as a datetime. It extracts the “1731” as year, etc. Specifically, this is the result:

{
	'start': 99,
	'end': 107,
	'text': '612a1731',
	'value': '1731-01-01T06:12:00.000-07:53',
	'confidence': 1.0,
	'additional_info': {
		'values': [{
			'value': '1731-01-01T06:12:00.000-07:53',
			'grain': 'minute',
			'type': 'value'
		}, {
			'value': '1731-01-02T06:12:00.000-07:53',
			'grain': 'minute',
			'type': 'value'
		}, {
			'value': '1731-01-03T06:12:00.000-07:53',
			'grain': 'minute',
			'type': 'value'
		}],
		'value': '1731-01-01T06:12:00.000-07:53',
		'grain': 'minute',
		'type': 'value'
	},
	'entity': 'time',
	'extractor': 'DucklingEntityExtractor'
}

My pipeline is this:

pipeline:
   - name: WhitespaceTokenizer
   - name: RegexFeaturizer
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: CRFEntityExtractor
     "BILOU_flag": True
     "features": [["low", "title", "upper"],[
      "bias",
      "low",
      "prefix5",
      "prefix2",
      "suffix5",
      "suffix3",
      "suffix2",
      "upper",
      "title",
      "digit",
      "pattern",],["low", "title", "upper"]]
     "max_iterations": 50
     "featurizers": []
   - name: DucklingEntityExtractor
     url: "http://localhost:8000"
     locale: "en_US"
     dimensions: ["time", "duration", "ordinal"]
   - name: DIETClassifier
     epochs: 500
     entity_recognition: False
     constrain_similarities: true
   - name: EntitySynonymMapper
   - name: FallbackClassifier
     threshold: 0.7
     ambiguity_threshold: 0.1

Since there’s no issue in my trainable components like CRF or DIET, and I can’t exactly train Duckling, any suggestions for what I can do?

Thanks in advance.

1 Like

If duckling mistaking the entity is the only problem you have, try to add “number” to the dimensions of duckling in the pipeline. This way “1731” will be extracted as a number, not a datetime.

1 Like

Thanks for responding @Gehova. That would be one solution, but is there some way to prevent Duckling from extracting anything here in the first place? The mongoid above is neither a datetime nor a numeric value.

Can I maybe build a custom component that would pre-extract such entities before Duckling is able to extract an incorrect value?

If the CRT can detect the mongoid correctly you can customize the Duckling component to eliminate the mongoid from the text before sending it to the Duckling server.

That could work. Could you tell me how I can go about doing this?

Hey @Gehova . Could you get back to me on this? Thanks.