Regex entity extractor generated a incomplete report

Hi, I think that I have a problem when generating RegexEntityExtractor_errors.json. The report is generated by running rasa test nlu. Analysing this report I notice that the entities are only extracted by CRFEntityExtractor. Bellow I pasted an example from the file generated. However, when I used rasa shell nlu the two extractors can extract correctly the entity.

RegexEntityExtractor_errors.json

  {
  "text": "Compare my tariff with tariff_type_2",
    "entities": [
      {
        "start": 23,
        "end": 36,
        "value": "tariff_type_2",
        "entity": "tariff_type"
      }
    ],
    "predicted_entities": [
      {
        "entity": "tariff_type",
        "start": 23,
        "end": 36,
        "confidence_entity": 0.9832901695523858,
        "value": "tariff_type_2",
        "extractor": "CRFEntityExtractor"
      }
    ]
  },

NLU:

{
  "text": "Compare my tariff with tariff_type_2",
  "intent": {
    "name": "tariff_comparison",
    "confidence": 0.984128006208084
  },
  "entities": [
    {
      "entity": "tariff_type",
      "start": 23,
      "end": 36,
      "value": "tariff_type_2",
      "extractor": "RegexEntityExtractor"
    },
    {
      "entity": "tariff_type",
      "start": 23,
      "end": 36,
      "confidence_entity": 0.9947785145759475,
      "value": "tariff_type_2",
      "extractor": "CRFEntityExtractor"
    }
  ],

Can anyone explain to me why is it happening?

Another thing is. During the cross validation (5 folds) I found the following warning:

UserWarning: No lookup tables or regexes defined in the training data that have a name equal to any entity in the training data. In order for this component to work you need to define valid lookup tables or regexes in the training data.

I have 50 examples where 19 don’t have any entity of lookup table Thank you.

Hi @dsmendes, welcome to the community forum! :slight_smile: Could you please provide more information on your setup? What Rasa version are you using (rasa --version)? Could you also please post all associated YAML files (config.yml, domain.yml, nlu.yml, etc.) that lead to the observed issue? Please enclose the contents in code blocks (using ```). Thanks!

Hi,

my rasa version:

└─ $ rasa --version
Rasa Version      :         2.8.2
Minimum Compatible Version: 2.8.0
Rasa SDK Version  :         2.8.2
Rasa X Version    :         None
Python Version    :         3.8.0
Operating System  :         Linux-5.4.0-90-generic-x86_64-with-glibc2.27
Python Path       :         xxxxxxxxx

I compress the rasa files here data.zip (8.3 KB)

Thanks

Hi @dsmendes, thanks for sending the config and data files. I was able to reproduce the issue and this looks like a bug in the creation of the cross-validation folds, where the lookup tables are not kept when the train and test data objects are created. I filed a bug report here: