Unable to use regex feature

I have 3 entities which contains numerical data

intent:sales_data

regex:EV_id

  • ^\d{4}$

regex:product_id

  • ^\d{8}$

regex:Transaction_id

  • ^\d{12}$

my config file

pipeline:

  • name: “SpacyNLP” model: “en_core_web_lg”

  • name: “SpacyTokenizer”

  • name: “SpacyFeaturizer”

  • name: “RegexFeaturizer”

  • name: “CRFEntityExtractor” features: [ [“low”, “title”, “upper”], [“bias”, “low”, “prefix5”, “prefix2”, “suffix5”, “suffix3”, “suffix2”, “upper”, “title”, “digit”, “pattern”], [“low”, “title”, “upper”] ]

  • name: “EntitySynonymMapper”

  • name: “CountVectorsFeaturizer”

  • name: “EmbeddingIntentClassifier” intent_tokenization_flag: true intent_split_symbol: “+” epochs: 50

  • name: DucklingHTTPExtractor url: http://localhost:8000 dimensions:

    • email
    • phone-number
    • time
    • amount-of-money

policies:

  • name: MemoizationPolicy
  • name: KerasPolicy
  • name: MappingPolicy

but i am unable to use regex features even i enter 5 digit number it detects EV_id help me with this issue .

@akelad @Juste @JulianGerhard

Hi @shubham

I built your bot based on your setup. This is my parsing-output for product:

POST http://localhost:5005/model/parse

{
	"text": "10000413"
}

{
    "intent": {
        "name": "sales_data",
        "confidence": 0.9558028579
    },
    "entities": [
        {
            "start": 0,
            "end": 8,
            "value": "10000413",
            "entity": "product_id",
            "confidence": 0.853251155,
            "extractor": "CRFEntityExtractor"
        }
    ],
    "intent_ranking": [
        {
            "name": "sales_data",
            "confidence": 0.9558028579
        },
        {
            "name": "goodbye",
            "confidence": 0.0675825328
        },
        {
            "name": "greet",
            "confidence": 0
        },
        {
            "name": "mood_great",
            "confidence": 0
        },
        {
            "name": "deny",
            "confidence": 0
        },
        {
            "name": "affirm",
            "confidence": 0
        }
    ],
    "text": "10000413"
} 

This is my parsing output for ev_id:

POST http://localhost:5005/model/parse

{
	"text": "2213"
}

{
    "intent": {
        "name": "sales_data",
        "confidence": 0.9566286206
    },
    "entities": [
        {
            "start": 0,
            "end": 4,
            "value": "2213",
            "entity": "ev_id",
            "confidence": 0.8784737378,
            "extractor": "CRFEntityExtractor"
        }
    ],
    "intent_ranking": [
        {
            "name": "sales_data",
            "confidence": 0.9566286206
        },
        {
            "name": "goodbye",
            "confidence": 0.0442600176
        },
        {
            "name": "greet",
            "confidence": 0
        },
        {
            "name": "mood_great",
            "confidence": 0
        },
        {
            "name": "deny",
            "confidence": 0
        },
        {
            "name": "affirm",
            "confidence": 0
        }
    ],
    "text": "2213"
}

My modifications:

I simply used the CRFEntityExtractor without feature specifications and I used the following stories:

## product_id
* sales_data{"product_id": "56435678"}
 - utter_goodbye
 
## ev_id
* sales_data{"ev_id": "2481"}
 - utter_goodbye

Maybe you forgot to accept mentioned entities while triggering the intent? Another thing: the data you provided for those entities is a little bit poor regarding the fact, that e.g. ev_id always ends on 13 (I don’t know if this is expected or not).

Regards

even I am getting same result when I test it with 1234 (4 digit number ) or 54688458 (8 digit number ) but when i test it with 54215 (5 digit number ) I get Ev_id as entity or 5451755 (9 digit number ) I get product_id as entity

Hi @shubham,

according to the documentation:

Regex features don’t define entities nor intents! They simply provide patterns to help the classifier recognize entities and related intents. Hence, you still need to provide intent & entity examples as part of your training data!

I think what you want to achieve is parsing either a ev_id, product or transaction_id by validating with a regex which of them it is. The RegEx Feature is only helping to predict the intent.

How about using one intent and the duckling extractor to extract the number and then validate it in a Custom- or FormAction to predict its use-case?

I would suggest using duckling to extract numbers instead and then fill your slots from custom actions, with some regex checking to decide which slot to fill

@shubham, I had a similar issue and wrote a custom component AKA True RegexEntityExtractor. Hope this helps! RASA Regex Entity Extraction - Naoko - Medium

2 Likes

This looks pretty cool @naoko. I’ll give it a try today.

1 Like

Hello @JulianGerhard did things work out for you?