Rasa lookup list not working

Despite giving training examples, my lookup tables are not working reliably.

This is the first question in a form that asks for the users suburb. I have 93 training examples (included below) as part of an inform NLU training and a lookup table of 3000+ possible entity values. (snippet included)

Even when I randomly type names from the list - some work - some dont. there is no pattern (apart from trained examples working well) it appears its just not “seeing” the list.

Anyideas??

My pipeline is


pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
    case_sensitive: False
    use_lookup_tables: True
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    retrieval_intent: faq
  - name: ResponseSelector
    epochs: 100
    retrieval_intent: chitchat
  - name: ResponseSelector
    epochs: 100
    retrieval_intent: out_of_scope
  - name: FallbackClassifier
    threshold: 0.2
    ambiguity_threshold: 0.001
nlu:
- intent: inform
  examples: |
    - we're either looking at [Mermaid Beach](location_postcode_or_suburb) or [South Port](location_postcode_or_suburb)
    - thinking about [south port](location_postcode_or_suburb)
    - probably [Kenmore](location_postcode_or_suburb) or Broadbeach
    - around [Mermaid](location_postcode_or_suburb) or thereabouts
    - around [riverhills](location_postcode_or_suburb)
    - [Kenmore](location_postcode_or_suburb)
    - Around [Brisbane](location_postcode_or_suburb)
    - Around [Kenmore](location_postcode_or_suburb)
    - In [Surfers](location_postcode_or_suburb)
    - [Toowong](location_postcode_or_suburb) area
    - looking at the [indroopilly](location_postcode_or_suburb) area
    - [indro](location_postcode_or_suburb)
    - we're interest in [Oxley](location_postcode_or_suburb)
    - we're thinking [Targina](location_postcode_or_suburb) area
    - near [Taringa](location_postcode_or_suburb) school
    - near [Toowong](location_postcode_or_suburb) train station
    - close to [toowong](location_postcode_or_suburb) train
    - close to [chelmer](location_postcode_or_suburb) station
    - interested in [Chelmer](location_postcode_or_suburb)
    - at [Graceville](location_postcode_or_suburb)
    - probably [mermaid waters] and miami area
    - around [kenmore](location_postcode_or_suburb)
    - i'm looking to buy a [3](buyer_min_num_bedrooms) bedroom in [Broadbeach](location_postcode_or_suburb) somewhere under [$450k](buyer_max_price)
    - [Burleigh Heads](location_postcode_or_suburb)
    - [Burleigh](location_postcode_or_suburb)
    - [Palm Beach](location_postcode_or_suburb)
    - [Mermaid Waters](location_postcode_or_suburb)
    - [Mermaid](location_postcode_or_suburb), [Mermaid Waters](location_postcode_or_suburb), [Mermaid Beach](location_postcode_or_suburb)
    - [Mermaid](location_postcode_or_suburb), [Mermaid Waters](location_postcode_or_suburb)
    - anywhere around [Mermaid](location_postcode_or_suburb) or thereabouts
    - anywhere on the [Noosa Headlands](location_postcode_or_suburb)
    - anywhere near [Brisbane](location_postcode_or_suburb)
    - anywhere near the [Noosa Headlands](location_postcode_or_suburb)
    - [Alexandra Hills](location_postcode_or_suburb)
    - [Algester](location_postcode_or_suburb)
    - [Alice Creek](location_postcode_or_suburb)
    - [Alice River](location_postcode_or_suburb)
    - [Allan](location_postcode_or_suburb)
    - [Allandale](location_postcode_or_suburb)
    - [Allenstown](location_postcode_or_suburb)
    - [Allenview](location_postcode_or_suburb)
    - [Alligator Creek](location_postcode_or_suburb)
    - [Allora](location_postcode_or_suburb)
    - [Alloway](location_postcode_or_suburb)
    - [Almaden](location_postcode_or_suburb)
    - [Aloomba](location_postcode_or_suburb)
    - [Alpha](location_postcode_or_suburb)
    - [Alpurrurulam](location_postcode_or_suburb)
    - [Alsace](location_postcode_or_suburb)
    - near [kenmore](location_postcode_or_suburb)
    - [brisbane](location_postcode_or_suburb)
    - [kenmore](location_postcode_or_suburb) or [chappel hill](location_postcode_or_suburb)
    - we're thinking [kenmore](location_postcode_or_suburb) area
    - [brisbane](location_postcode_or_suburb) area
    - [kenmore](location_postcode_or_suburb) area
    - [burbank](location_postcode_or_suburb)
    - in [fig tree pocket](location_postcode_or_suburb)
    - around [kenmore](location_postcode_or_suburb) area
    - around [kenmore](location_postcode_or_suburb)
    - near [Alexandra Headland](location_postcode_or_suburb)
    - near [toowong](location_postcode_or_suburb) bus station
    - near [toowong](location_postcode_or_suburb) bus station would be ideal
    - anywhere in [woodridge](location_postcode_or_suburb)
    - around [spring hill](location_postcode_or_suburb)
    - [noosa](location_postcode_or_suburb)
    - in [burleigh](location_postcode_or_suburb)
    - close to [surfers paradise](location_postcode_or_suburb)
    - anywhere in [Brisbane](location_postcode_or_suburb)
    - near [redcliffe](location_postcode_or_suburb)
    - near [indroo](location_postcode_or_suburb)
    - around [chapel hill](location_postcode_or_suburb)
    - around [noosa](location_postcode_or_suburb) or [noosa headlands](location_postcode_or_suburb)
    - near [toowong](location_postcode_or_suburb) or [indroo](location_postcode_or_suburb) bus station
    - our [house](dwelling) is in [jindalee](location_postcode_or_suburb)
    - we're in [kenmore](location_postcode_or_suburb)
    - [kenmore](location_postcode_or_suburb), [indroo](location_postcode_or_suburb) around there
    - areas like [kenmore](location_postcode_or_suburb), [toowong](location_postcode_or_suburb), [chapel hill](location_postcode_or_suburb)
    - in or around the [ormeau](location_postcode_or_suburb) region
    - anywhere close to [toowong](location_postcode_or_suburb) bus or train station
    - we are looking northside... maybe [mt ommaney](location_postcode_or_suburb) area
    - interested in [kenmore](location_postcode_or_suburb), or somewhere near there
    - the [apartment](dwelling) [complex](dwelling) is in [Jindallee](location_postcode_or_suburb)
    - [upper mt gravatt](location_postcode_or_suburb)
    - [indroo](location_postcode_or_suburb)
    - looking around [kenmore](location_postcode_or_suburb) or [toowong](location_postcode_or_suburb)
    - in or near [corinda](location_postcode_or_suburb)
    - in or around the [kenmore](location_postcode_or_suburb) area
    - city outskirts, [northern suburbs](location_postcode_or_suburb)
    - city outskirts, around the [northern suburbs](location_postcode_or_suburb)
    - we're thinking [toowong](location_postcode_or_suburb) or [indoorpilly](location_postcode_or_suburb) - close to public transport

Lookup table snippet:

nlu:
- lookup: location_postcode_or_suburb
  examples: |
    - northern suburbs
    - eastern suburbs
    - western suburbs
    - southern suburbs
    - north suburbs
    - east suburbs
    - west suburbs
    - south suburbs
    - surfers

@john.christian Give it a try, if sometimes working some time not, please delete older trained model; re-train and run.

I fixed this by changing my pipeline to use:

  - name: RegexFeaturizer
    case_sensitive: false
    use_word_boundaries: true
  - name: LexicalSyntacticFeaturizer
  - name: CRFEntityExtractor
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    entity_recognition: false
    epochs: 100

Which basically stops the DIETClassifier from detecting entities (to avoid the double up detection) and uses CRFEntityExtractor and RegexFeaturizer combined which detects entities using lookup lists really well.

1 Like