Lookup Table didn't work for RegexEntityExtractor

Liangda-w · December 15, 2020, 9:18pm

Hi Rasa Community,

I use lookup table full of uppercase words for RegexEntityExtractor, in the doc NLU Training Data it is said that lookup table is case insensitive. However after cross validation, it turns out that my RegexEntityExtractor is not able to identify all those entities in lowercase. It seems that the lookup table is case sensitive? I’ve defined couple of training examples for the entities in lookup table and set case_sensitive of RegexFeaturizer to False.

This problem really confused me for a while, can someone help me to solve this?

This is how my pipeline looks like:

language: de

pipeline:
   - name: WhitespaceTokenizer
   - name: RegexFeaturizer
     case_sensitive: False
   - name: RegexEntityExtractor
     case_sensitive: False
   - name: LexicalSyntacticFeaturizer
   - name: CountVectorsFeaturizer
   - name: CountVectorsFeaturizer
     analyzer: char_wb
     min_ngram: 1
     max_ngram: 4
   - name: "DucklingEntityExtractor"
     url: "http://localhost:8000"
     dimensions: ["time", "number", "amount-of-money"]
     locale: "de_DE"
     timezone: "Europe/Berlin"
   - name: "CRFEntityExtractor"
   - name: DIETClassifier
     epochs: 100
   - name: EntitySynonymMapper
   - name: ResponseSelector
     epochs: 100
   - name: FallbackClassifier
     threshold: 0.75
     ambiguity_threshold: 0.1

thanks and regards

riya.shah · June 28, 2021, 12:19pm

Hey, I’m facing the same issue. Did you find out the reason for this? Or how did you fix it?

nik202 · June 28, 2021, 1:13pm

@riya.shah Heya! you not able to fetch the data using lookup table? Right

riya.shah · June 28, 2021, 1:31pm

Hey @nik202 Yes. I have a lookup table like this:

- lookup: insurance_provider
  examples: |
    - HDFC ERGO
    - HDFC
    - Tata AIG
    - Tata
    - ICICI
    - ICICI Lombard

And it is not able to extract entity. This is my DIETClassifier errors.json

{
    "text": "i think it was icici",
    "entities": [
      {
        "start": 15,
        "end": 20,
        "value": "ICICI",
        "entity": "insurance_provider"
      }
    ],
    "predicted_entities": []
  },
  {
    "text": "umm from tata",
    "entities": [
      {
        "start": 9,
        "end": 13,
        "value": "Tata",
        "entity": "insurance_provider"
      }
    ],
    "predicted_entities": []
  }

nik202 · June 28, 2021, 1:41pm

@riya.shah What issue you getting? You need to share some other supporting files please.

riya.shah · June 28, 2021, 1:52pm

Hey, I am trying to extract the name of insurance companies from the user’s message.

This is the config file…

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: en

pipeline:
  - name: custom_nlu_components.CustomTranslator.CustomTranslator
    #    Required: translate_url
    translate_url: <translate_url>
    "source_language": auto
    "target_language": en
  - name: ConveRTTokenizer
  - name: ConveRTFeaturizer
    model_url: <model_url>
  - name: RegexFeaturizer
    case_sensitive: False
  - name: LexicalSyntacticFeaturizer
  - name: DucklingEntityExtractor
    url: "http://localhost:8000"
    dimensions: [ "time", "duration", "number" ]
    timeout: 5
    timezone: "Asia/Kolkata"
    locale: "en_IN"
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    random_seed: 73
    model_confidence: linear_norm
    constrain_similarities: True
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    random_seed: 73
    model_confidence: linear_norm
    constrain_similarities: True
  - name: FallbackClassifier
    threshold: 0.45

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
    random_seed: 73
    model_confidence: linear_norm
    constrain_similarities: True
  - name: RulePolicy

These are some of the training examples:

- intent: in_share_vendor_details
    examples: |
      - Insurance agent from [ICICI](insurance_provider) had called me
      - its from [LIC](insurance_provider)
      - i can't recall , probably through [icici](insurance_provider)
      - I bought a policy from [Tata](insurance_provider)
      - [HDFC Ergo](insurance_provider)

These are test examples:

  - intent: in_share_vendor_details
    examples: |
      - from [Tata AIG](insurance_provider)
      - I have renewed from [hdfc](insurance_provider) life insurance
      - I think it was [ICICI](insurance_provider)
      - i bought it from [tata aig](insurance_provider)
      - Umm from [Tata](insurance_provider)

Now these examples are present in lookup table as well. When i run rasa test, entity extraction is failing for a lot of examples. Let me know if you need any other files.

nik202 · June 28, 2021, 1:54pm

@riya.shah please see this and follow, if still your problem persist I will try to run your code. meanwhile : https://youtu.be/gvyfQZMnHPY

riya.shah · June 28, 2021, 2:17pm

Hey, I watched the video. I have implemented it the same way. What I observed is that if I keep the sentence structure of the test example the same as training, then entity extraction works. For e.g training example is

i renewed it from royal sundaram

And i use the below example for testing then it works

i renewed it from tata aig

But If the test example has a different sentence structure that is not seen in training, then it fails.

nik202 · June 28, 2021, 2:32pm

@riya.shah Well lookup is basically used for small words like locations, city, products etc with single words ok. What ever you mention, it basically train on that only out of scope not worked also when I used lookup so, now I not using. I hope it will help you. Further reading: NLU Training Data

riya.shah · June 28, 2021, 6:42pm

Hey, it is getting classified to the right intent though. Also, the test examples do have single word entities, and it is failing for single words as well.

nik202 · June 28, 2021, 6:44pm

@riya.shah any screenshot ?

riya.shah · June 28, 2021, 6:51pm

This is DIET Classifier error report & the lookup table in use.

riya.shah · June 28, 2021, 6:54pm

As you can see, predicted entities is [], even though i have given such examples in lookup table. My question is, even if i don’t have any training example of “from xyz” & it is able to classify to the right intent, it should be able to extract entity as well right?

nik202 · June 28, 2021, 7:11pm

@riya.shah because he will only recognise intent as you mention in lookup table.

minakshimathpal · February 3, 2022, 12:31pm

@nik202 hello there you said that it will recognize intent only coz it’s mentioned that way in lookup. but it has some examples of values of the entity as well. How to make it recognize entity as well.

nik202 · February 3, 2022, 5:56pm

@minakshimathpal whatever you mention in the lookup it will only recognize that, apart from that it will not.

minakshimathpal · February 3, 2022, 7:39pm

@nik202 i have7 entities with 20-25 different values. for eg

apparels_choice with 10-15 values
color_choice with 20 values
accessories with 10 values etc etc Now in my training data suppose out of 15 values of apparel_choice i have utterances with 5 values only. Rest 10 values i have entered in lookup table. But all those values which are present only in lookup table and not in training data, those entity values are not being picked…

nik202 · February 3, 2022, 7:57pm

@minakshimathpal what ever you mention in the lookup table it will take that only, generic thumb rule and it will trained based on that only.

minakshimathpal · February 3, 2022, 9:04pm

@nik202 In my case it is not working…If i have mentioned the entity in examples then only is being recogniised…Its not taking into account the entity values(or names) mentioned in lookup table…

for eg in my training data i have -intent: purchase example: |

find me a black sling bag
lookup: accessories
- hand bag
- wallet
- purse

In this case its not picking hand bag or wallet or purse. Only picking sling bag because i have training example for that

nik202 · February 3, 2022, 9:05pm

@minakshimathpal can you please format the code with the entities?

Topic		Replies	Views
Lookup Table not working for DIET Classifier + RegexFeaturizer Rasa Open Source	10	2125	June 29, 2021
Lookup table is not working Rasa Open Source	15	5684	October 9, 2022
Lookup table does not work Rasa Open Source	5	2463	January 26, 2022
Lookup not working in entity extraction Rasa Open Source	13	1343	December 2, 2021
Lookup table didn’t work for RegexFeaturizer + DIETClassifier Rasa Open Source	20	1959	February 4, 2022

Lookup Table didn't work for RegexEntityExtractor

Related topics