Spacy return numerical value from text numbers

Hello all!

I am using Spacy to extract “CARDINAL” entities from user’s input. Currently:

  • If the user writes “3” I get a value of “3”.
  • If the user writes “three” I get a value of “three”.

I need to get the same numerical value (“3”) in the case the user inputs “three” as well. This is necessary for me because I need to integrate with an API that is expecting integers on that slot and I can not do the translation from string (e.g. “twenty five”) to int (e.g. 25) myself. Anyone has any idea how to achieve this?

My config is the following:

- name: SpacyTokenizer
- name: SpacyEntityExtractor
  dimensions: ["CARDINAL"]
- name: SpacyFeaturizer
  pooling: mean
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100

Thanks for the help!

I would use Duckling for this. There’s a good blog on this topic here.

Hello! Thanks for the help! It says that Rasa NLU uses the REST interface of Duckling. What does that mean exactly? Does it mean that on every input Rasa sends a request to the server running Duckling and gets the required entities? Does Rasa do this automatically (as long as I include this in the config.yml) and I get the annotated entities in the Rasa response? Do I need to perform this communication between Rasa and Duckling myself somehow?

Thanks!

Does it mean that on every input Rasa sends a request to the server running Duckling and gets the required entities?

Yes

Does Rasa do this automatically (as long as I include this in the config.yml) and I get the annotated entities in the Rasa response?

And you can use a custom action to review the entity extraction (which you may want to do if you could have multiple entity extractors).

Do I need to perform this communication between Rasa and Duckling myself somehow?

You need to have Duckling running and point to it in the config.yml. Read the duckling section.

1 Like

Thanks for the help, really appreciated! One last question, I got Duckling set up and it’s parsing numbers and times. My problem now is that this input: “20221612” is parsed as a number by Duckling, while I want it to be parsed as time. I have also included these sample utterances in my training set:

- coming on [20221216](time)
- leaving on [20220814](time)

but still what I get is 'additional_info': {'value': 20221612, 'type': 'value'}, 'entity': 'number', 'extractor': 'DucklingEntityExtractor'}, {'entity': 'number', 'start': 0, 'end': 8, 'confidence_entity': 0.5148515105247498, 'value': '20221612', 'extractor': 'DIETClassifier'}

Any ideas?

Yes, this is part of the reason I suggested you might want to have a custom action where you can resolve the entity to slot conversion.