Lookup table didn’t work for RegexFeaturizer + DIETClassifier

I have DIETClassifier in my pipeline and since lookup table works only with RegexFeaturizer OR RegexEntityExtractor, I added RegexEntityExtractor into the pipeline, But when I run RASA it always gives me an error due to conflict of entities extracted from two extractor.

I can not remove my DIETClassifier from pipeline, because I have used entities groups and it is only supported by the DIETClassifier and CRFEntityExtractor.

So I choose to go with RegexFeaturizer even after providing countable number of examples, but still unable to extract entities using lookup table.

Here is pipeline components :

language: en
pipeline:
  - name: SpacyNLP
    model: "en_core_web_md"
    case_sensitive: False
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 150
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.8
    ambiguity_threshold: 0.1

nlu.yml file looks like this

version: "2.0"
nlu:
- intent: app_filter
  examples: |
    - filter in CP Application
    - filter in CP
    - filters available in CP Application
    - filters available in CP
    - total filters available in CP Application
    - total filters available in CP
    - what is filter
    - what are filter
    - what is purpose of a filter
    - what is the purpose of a filter
    - What is [Card Model]{"entity": "filterName", "group": "group_filter_name"} ?
    - What is [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What is [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - What are [Card Model]{"entity": "filterName", "group": "group_filter_name"} ?
    - What are [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What are [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - How many [Card Model]{"entity": "filterName", "group": "group_filter_name"} ?
    - How many [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - How many [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - Total [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - Total [Card Model]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What is [Device Version]{"entity": "filterName", "group": "group_filter_name"} ?
    - What is [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What is [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - What are [Device Version]{"entity": "filterName", "group": "group_filter_name"} ?
    - What are [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What are [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - How many [Device Version]{"entity": "filterName", "group": "group_filter_name"} ?
    - How many [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - How many [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - Total [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - Total [Device Version]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What is [Product Type]{"entity": "filterName", "group": "group_filter_name"} ?
    - What is [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What is [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - What are [Product Type]{"entity": "filterName", "group": "group_filter_name"} ?
    - What are [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - What are [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - How many [Product Type]{"entity": "filterName", "group": "group_filter_name"} ?
    - How many [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - How many [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?
    - Total [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP ?
    - Total [Product Type]{"entity": "filterName", "group": "group_filter_name"} in CP Application ?

- lookup: filterName
  examples: |
    - Card Type
    - Chassis Speed
    - Device Model
    - Device Type
    - Device Usage
    - Market Type
    - Node Type
    - Port Speed
    - Optic Type

Can anyone please help me on this issue.

Hi @naveensiwas, could you please also post your domain.yml file and the log output of rasa train? From your NLU file, it seems you are just using a single intent (app_filter). In this case, DIETClassifier might not be trained successfully, since it needs at least two classes as training data.

@MatthiasLeimeister I am using 10 intents, just for an example I posted only one (app_filter) intent. But using only one entity (filterName) to try the lookup table.

Please find my domain.yml file below :

version: '2.0'
config:
  store_entities_as_slots: true
session_config:
  session_expiration_time: 60
  carry_over_slots_to_new_session: true
intents:
- bot_greet:
    use_entities: true
- bot_goodbye:
    use_entities: true
- bot_challenge:
    use_entities: true
- app_filter:
    use_entities: true
- app_preference:
    use_entities: true
- app_search_site:
    use_entities: true
- app_sort:
    use_entities: true
- app_show_hide:
    use_entities: true
- app_export:
    use_entities: true
- app_navigate:
    use_entities: true
- nlu_fallback:
    use_entities: true
entities:
- filterName
slots: {}
actions:
- action_app_filter
forms: {}
e2e_actions: []

find the log of rasa train as attachment. rasa_train.txt (6.5 KB)

Thanks for the clarification! The training logs look good. I tried training a model with your config and the app_filter intent plus a simple greet and DIETClassifier succeeded in learning the entities with a nearly perfect cross validation score. Could you please post the output of the following to compare:

rasa test nlu --cross-validation --runs 1 --folds 2

Thanks! :slight_smile:

I think you want me to test rasa NLU for 2 fold cross validation. But I didn’t get the meaning of -- runs 1 parameter.

Please find the result in attached file rasa_cross_validation.txt (14.7 KB)

Thanks :slightly_smiling_face:

Ah great, thanks! So from the result here:

2022-02-03 17:27:15 INFO     rasa.nlu.test  - Entity extractor: DIETClassifier
2022-02-03 17:27:15 INFO     rasa.nlu.test  - test Accuracy: 1.000 (0.000)
2022-02-03 17:27:15 INFO     rasa.nlu.test  - test F1-score: 1.000 (0.000)
2022-02-03 17:27:15 INFO     rasa.nlu.test  - test Precision: 1.000 (0.000)

it seems that the entity prediction works really well with your pipeline, no?

If there is still a problem, could you describe in detail what goes wrong and what you mean by but still unable to extract entities using lookup table in your first post ? E.g. which test examples fail?

1 Like

@MatthiasLeimeister hello there. I am facing the same issue. All the entities which i have mentioned in my training data are being extracted correctly. Those values which i have entered in the lookup table for respective entities, my pipeline implementation is failing to extract that

@MatthiasLeimeister this is my pipeline…Using rasa 2.0

@MatthiasLeimeister this is after cross validation

@MatthiasLeimeister please help. i have to present the results in Friday’s sprint

@MatthiasLeimeister Actually I mean to say all the entities which I have mentioned in my training data are being extracted correctly. Those values which I have mentioned in the lookup table are failing to extract that.

Right, I see, this is a different config.yml than the one you posted before.

I would see 2 options: you can either use DIETClassifier or CRFEntityExtractor if you need to make use of the group annotations. In that case I would suggest to add more diverse training examples, covering all types that you mention in the lookup table.

If you don’t necessarily need groups, you can use RegexEntityExtractor as the only entity extractor and the lookup table. So you could remove CRFEntityExtractor from your second config. Otherwise you will get the warning about duplicate entities as you mentioned.

I made a quick test with the following config:

pipeline:
  - name: SpacyNLP
    model: "en_core_web_md"
    case_sensitive: False
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 150
    constrain_similarities: true
    entity_recognition: False
  - name: RegexEntityExtractor
    use_lookup_tables: True
    use_regexes: True
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.8
    ambiguity_threshold: 0.1

Running rasa shell nlu gives correct entities from the lookup table (that are not in the intent examples):

NLU model loaded. Type a message and press enter to parse it.
Next message:
What is Chassis Speed in CP application?
{
  "text": "What is Chassis Speed in CP application?",
  "intent": {
    "id": -6846465290303250027,
    "name": "app_filter",
    "confidence": 1.0
  },
  "entities": [
    {
      "entity": "filterName",
      "start": 8,
      "end": 21,
      "value": "Chassis Speed",
      "extractor": "RegexEntityExtractor"
    }
  ],
  ...

Does that work for you?

@MatthiasLeimeister Yes it is extracting entities from lookup table, if I keep RegexEntityExtractor after DIETClassifier and made entity_recognition: False in DIETClassifier.

The main reason to keep group in my training data, I need to separate similar questions into groups to hit an right API or execute database query using custom Action.

@MatthiasLeimeister is there any way to separate/bucket the questions with in the same intent.

@minakshimathpal did you got any solution for this.

@naveensiwas not yet…i think something is wrong with my config.yml file…Did you find any solutionn…If yes…please share with me…

@naveensiwas above pipeline suggested by MatthiasLeimeister is not working in my case…using rasa 2.0

@MatthiasLeimeister please help

@minakshimathpal this pipeline works for me in my use-case please try it out.

I am using RASA 2.8.14, can you please explain me your scenario in detail, I will try my best to help you.

@naveensiwas nlu.yml (10.9 KB) this is my nlu.yml file the all the enitities in training examples are being extracted but those entities which i have mentioned in look up tables are ignored by the above pipeline