Lookup not working in entity extraction

Hi,

I’m using regerxfeturizer with dietclassifier and trying to use lookup tables for names but the entities are not recognized correctly if anything other than that in training example is asked.

Can someone help me with this and also if there is any limit to how many lookups we can use or the number of values in the lookup.

Welcome to the forum :slight_smile:

Can you show me your pipeline? It should have RegexFeaturizer and RegexEntityExtractor as mentioned in the docs.

Also provide at least two examples of the entity in the NLU files.


No there is no limit (theoretically - you can still run out of RAM/CPU/storage during training) :slight_smile: But the more lookups and examples you use, the slower training will be.

This is my pipeline

- name: WhitespaceTokenizer
- name: RegexFeaturizer
  case_sensitive: False
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
  constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100
  constrain_similarities: true
- name: FallbackClassifier
  threshold: 0.3
  ambiguity_threshold: 0.1

Try this please :slight_smile:

- name: WhitespaceTokenizer
- name: RegexFeaturizer
  case_sensitive: false
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
  constrain_similarities: true
- name: RegexEntityExtractor # <--- add this
  case_sensitive: false      #
  use_lookup_tables: true    #
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100
  constrain_similarities: true
- name: FallbackClassifier
  threshold: 0.3
  ambiguity_threshold: 0.1

I used regexentityextractor before but when I used with diet classifier it was detecting the entities multiple times.

What do you mean by that? Can you give an example?

Also which version of Rasa are you using?

I’m using rasa 2.8.10. For example: What are the sales in 2020. The sales entity was extracted twice and the entity had 2 values in it both sales. I will change the config and check once again.

The entities are extracted only when I give the complete value same as given in the table. How can I get even if there is a matching to the name. Like if the name has first name and last name it’s working only when I give both. Is there a way to get when I give one of them?

I see… that’s weird, but is that so bad? As long as it got extracted correctly, it should be fine no?

Yes, that’s how lookup tables work. You need to list all possible values of your entity.

That’s why it’s bad for stuff like names, though here’s one: https://raw.githubusercontent.com/ChrisRahme/fyp-chatbot/main/data/lookups/person_name.yml

I don’t really understand what you mean, sorry.

But in my opinion, the best approach is using a form asking explicitly for two different slots: first_name and last_name, which are to be filled using from_text mapping.

I’m not just using for names but for different types of fields like I have few IDs and descriptions for those IDs but for IDs I don’t have any specific pattern. In this case how can I extract entities properly?

If it doesn’t have a specific pattern, again, use from_text mapping

I don’t understand. Even if I use from_text how would I know if that belongs to which ID? I have multiple IDs like group ID, category ID etc and the same goes for description also. If I use from_text I would just get the value of it in the slot with it’s name same as value right?

I encountered a similar problem while building my chatbot. It asked for a user’s login information. It could be a user ID, username, or phone number.

Let’s say the user entered ‘12345678’. Is that an ID, username, or phone number?

The from_text mapping will take any value, and using a FormValidationAction’s validate() method,

  1. I query a database to get the user with ID = 12345678.

    It worked? Then it’s an ID. It didnt’? Go to 2.

  2. I query the database to get the user with username = 12345678.

    It worked? Then it’s a username. It didnt’? Go to 3.

  3. I query the database to get the user with phone number = 12345678.

    It worked? Then it’s a phone number. It didnt’? Don’t validate the slot and ask again.


Be gentle with the bot. Don’t expect the impossible. It can do what humans can do, but just faster. If there’s no structure/pattern to your IDs, and they’re not stored anywhere, how will the bot guess which one it is?

Yeah for that I have created text files with values in it and using these text files for lookup. So that I would not need to query the database several times. But the lookups were not working because I was giving only part of the description not the exact one. There is a library for fuzzy matching but I’m not sure how I can implement it in lookup. Can you help me with it?