Can't extract regex into entity

Hello all! Noob here - I appreciate (in advance!) any help. Trying to just get the hang of simple tasks in Rasa and making a simple practice zip code bot, but can’t find any solutions that fix my problem, which is that I can’t seem to extract entities via regex. It’s likely a small, silly formatting mistake but I can’t seem to find it…I’ve tried multiple ways of declaring the entities, etc, and checked that the regex works, at least in normal python regex stuff (it’s just getting a simple 5-digit number). Also my pipeline is the default pipeline all commented out, though I’ve also tried it by removing comments and declaring everything (the Regex Featurizer is in there). Also the utter_confirm_zip_code just says something along the lines of “you want to give me your zip code, right?”

From nlu.yml:

- regex: zip_code
  examples: |
    - \d{5}

- intent: give_zip_code
  examples: |
    - my zip code is [23445](zip_code)
    - I live in [54321](zip_code)
    - zip is [00000](zip_code)
    - zip [12345](zip_code)
    - it's [95673](zip_code)
    - This is my zip code [34378](zip_code)

From domain.yml:

intents:
  - greet
  - goodbye
  - affirm
  - deny
  - bot_challenge
  - give_zip_code

entities:
  - zip_code


slots:
  zip_code:
    type: float
    initial_value: 0
    mappings:
    - type: from_entity
      entity: zip_code

From stories.yml:

stories:
- story: get zip code
  steps:
    - intent: greet
    - action: utter_zip_code_bot_greet
    - intent: affirm
    - action: utter_confirm_zip_code
    - intent: give_zip_code
    - action: utter_zip_code_thanks
    - action: utter_goodbye

Hello! :slight_smile:

As mentioned in the docs, you need to use the RegexEntityExtractor component in your NLU pipeline. You can place it after DIETClassifier.

If you want an example pipeline, you can use this:

language: en

pipeline:
- name: SpacyNLP
  model: en_core_web_md
- name: SpacyTokenizer
- name: RegexFeaturizer
  case_sensitive: false
- name: LanguageModelFeaturizer
  model_name: "bert"
  model_weights: "rasa/LaBSE"
  cache_dir: ./.cache
- name: SpacyFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 141
  random_seed: 1
  checkpoint_model: True
- name: RegexEntityExtractor
  case_sensitive: false
  use_lookup_tables: true
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 26
  random_seed: 1
  checkpoint_model: True
- name: FallbackClassifier
  threshold: 0.2

policies:
- name: AugmentedMemoizationPolicy
  max_history: 8
- name: TEDPolicy
  max_history: 8
  epochs: 41
  random_seed: 1
  checkpoint_model: True
- name: RulePolicy
  core_fallback_threshold: 0.2
  core_fallback_action_name: action_default_fallback
  enable_fallback_prediction: true
  restrict_rules: true
  check_for_contradictions: true

Hey, wow, I missed that completely! Thanks for the help!

That being said, I included it in the default pipeline after the DIETClassifier and it still had the same problem, and I tried your pipeline but had trouble running it - let me play with it a bit more this afternoon after work! I’ll update back! Thanks!

1 Like

What’s the error you’re facing?

Maybe the LanguageModelFeaturizer is throwing an error because you don’t have a folder named .cache?

Chris, Thanks for your help with all this, and sorry I’m having all this difficulty!

One of the bits that I’m having trouble with is that it’s not actually throwing any error at all, it just acts as if I ran off the story (or it didn’t recognize the regex in the intent). I did get your pipeline running after downloading the space corpus, but still same issue. More code (sorry for any flippancy in the bot’s comments, I’m just using this one to learn!)

all of domain.yml:

 
intents:
  - greet
  - goodbye
  - affirm
  - deny
  - bot_challenge
  - give_zip_code

entities:
  - zip_code

slots:
  zip_code:
    type: float
    initial_value: 0
    mappings:
    - type: from_entity
      entity: zip_code

responses:
  utter_greet:
  - text: "Hey! How are you?"

  utter_cheer_up:
  - text: "Here is something to cheer you up:"
    image: "https://i.imgur.com/nGF1K8f.jpg"

  utter_did_that_help:
  - text: "Did that help you?"

  utter_happy:
  - text: "Great, carry on!"

  utter_goodbye:
  - text: "Bye, and your zip code is {zip_code}! That's not creepy!"

  utter_iamabot:
  - text: "I am a bot, powered by Rasa."

  utter_confirm_zip_code:
  - text: "It sounds like you're trying to give me your zip code. What is it?"

  utter_zip_code_thanks:
  - text: "I've got your zip code down as {zip_code}, thanks!"

  utter_zip_code_bot_greet:
  - text: "Hi, I'm ZipCodeBot. I want to steal your info. Can I have your zipcode?"

session_config:
  session_expiration_time: 60
  carry_over_slots_to_new_session: true

And the output of interacting with it (in picture form, sorry)

Thanks again! John

No worries!

Your code looks correct, unless I’m also missing something… Try setting the slot type to text instead of float?

You still have your regex in nlu.yml, correct?

Anyway, try interactive learning via Rasa X.

Yep! I still have the regex in the nlu.yml. Tried swapping out for the “text” type, same result, still has only default value stored in it at the end.

Is Rasa X the easiest way to go to develop a bot and stories? My quick scan had seemed like it’s for deploying an already-mostly-functioning bot, but I probably missed some subtleties there and would happily welcome an “easy mode” :smile:

Man that’s a weird error you’re getting…

It definitely is!

There’s a use-case for every stage of bot-building :slight_smile:

  • In the earlier stages, Interactive Learning can help you create intents, entities, and stories
  • In the laterstages, Rasa X lets you share your bot with test users, and Intent Insights will show you a history of inputs and their predicted intents, which you can correct if wrong or confirm to further encourage the behavior.

This is a very simple explanation. Again, Rasa X is made to help you through the whole CDD process. From the docs:

Explore how Rasa X helps you follow Conversation-Driven Development:

  • Share your assistant with users as soon as possible
  • Review conversations on a regular basis
  • Annotate messages and use them as NLU training data
  • Test that your assistant always behaves as you expect
  • Track when your assistant fails and measure its improvement
  • Fix how your assistant handles unsuccessful conversations