Remove whitespace from entity

josenlp90 · March 18, 2022, 2:34pm

Hello everyone!

I was wondering if someone could help me with the following issue: I need to create in Rasa 3.0.9 a model that’s able to extract an entity containing only 7 or 8 digits without whitespaces inside.

For example, if I get this input: “1234 567”, it should extract it as “1234567” in the entity’s value. The same would happen in cases such as: " 12345678", "12345678 " or “1 2 3 4 5 6 7 8” (for these cases the entity’s value should be extracted as “12345678”, with no adjacent or inner spaces).

Next message:
30 000 000
{
  "text": "30 000 000",
  "intent": {
    "name": "identificarse",
    "confidence": 1.0
  },
  "entities": [
    {
      "entity": "dni",
      "start": 0,
      "end": 10,
      "value": "30 000 000",
      "extractor": "RegexEntityExtractor"
    }
  ],
  "text_tokens": [
    [
      0,
      2
    ],
    [
      3,
      6
    ],
    [
      7,
      10
    ]
  ],
  "intent_ranking": [
    {
      "name": "identificarse",
      "confidence": 1.0
    },
    {
      "name": "saludo",
      "confidence": 6.369950678042358e-10
    }
  ],
  "response_selector": {
    "all_retrieval_intents": [],
    "default": {
      "response": {
        "responses": null,
        "confidence": 0.0,
        "intent_response_key": null,
        "utter_action": "utter_None"
      },
      "ranking": []
    }
  }
}

I tried the following regex, but still the entity’s value is extracted with spaces:

(?<!\d)(?<!\d )(?:(?:\d *){7}|(?:\d *){8})(?<! )(?! ?\d)

I also tried turning the Whitespace Tokenizer off on the pipeline, but the model throws an error when I want to train it.

Is it possible to solve this type of extraction with a regex or some pipeline component (like a tokenizer or a featurizer)? Or is it something that can only be solved with custom actions?

Thank you so much for your help!

ChrisRahme · March 18, 2022, 4:10pm

The easiest way to do it is in a custom action, just extract the slot with the spaces as is the case now, then remove the spaces from the string.

Another, harder solution is to write a custom component.

Topic		Replies	Views
Returned entity getting formatted Rasa Open Source	4	761	April 16, 2019
Entities with punctuation and space are not recognized Rasa Open Source	4	559	March 3, 2021
Rasa is not extracting Entity value with hyphen and space Rasa Open Source	9	1353	June 18, 2021
[SOLVED] Entity values not getting extracted from user utterances Rasa Open Source	0	576	June 1, 2019
Can't extract regex into entity Rasa Open Source	7	1234	February 18, 2022

Remove whitespace from entity

Related topics