Requesting help with performing entity recognition containing special characters like (-)

Hello Rasa community members,

I am trying to implement a use case like-

User: what’s the latest available docker image tag for rasa/rasa-sdk?

BOT: The latest available docker image for rasa/rasa-sdk is rasa/rasa-sdk:3.1.1

What are my options here for the pipeline config to make the entity extraction work generically?

Current Pipeline:

pipeline:
  - name: SpacyNLP
    model: "en_core_web_lg"
    case_sensitive: false
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "RegexFeaturizer"
  - name: "LexicalSyntacticFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: "DIETClassifier"
    epochs: 100
  - name: SpacyEntityExtractor
    dimensions: ["PERSON"]
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.1
  - name: "EntitySynonymMapper"

nlu.yaml sample example, I have tried with more than 20 examples…

- intent: get_latest_container_image
  examples: |
    - get me the latest docker image for [rasa](meta_name)/[rasa-sdk](image_name)
...
....
...

domain.yml has entities meta_name and image_name and the same slots which my custom action uses to call an API which does a docker search.

It works fine if the user ask for a image name that is already part of the nlu,yaml example.

User: What’s the latest docker image for rasa/frasa-sdk?

So rasa/rasa-sdk works fine, the BOT identifies rasa as the entity meta_name and rasa-sdk as image_name.

But if the user asks the BOT for any image which is not part of the example, the entity recognition fails,

User: What’s the latest docker image for rasa/financial-demo?

so for rasa/financial-demo, the entity for meta is correctly identified as rasa but the entity for image_name gets incorrectly identified as financial-.

As you see here, the entity value for image_name is getting split at the end using the - character, where as the correct value should be financial-demo. If I add this to the nlu intent example, it will work but again it won’t be generic and a bit redundant/impossible to add all possible docker image names!

What’s the best way to make this work generically? I’ve tried regex entity extraction and it did not yield the desired result. I tried few things like adding more examples, keeping only the minimum 2 examples so that the regex entity extractor and diet classifier don’t clash, still no luck.

Thank you!!!

Try a RegEx expression for the docker image name.

Hey @stephens,

Thank you so much for your response. I did try using a regex expression in my nlu.yaml and added regex entity extractor in my existing pipeline. It resulted in mismatched and conflicting entity resolution between the regex entity extractor and DIET classifier/entity extractor.

I have also followed the note mentioned here and it did not help. I tried both the recommended methods, adding only 1 nlu example with regex entity extractor and adding multiple examples but not having regex entity extractor in the pipeline.

I also did refer to one of the examples which uses regex for entity recognition and it did not help. insurance-demo.

nlu.yaml

nlu:
- regex: image_name
  examples: |
    - ^[a-z\-]*

config.yaml

pipeline:
  - name: SpacyNLP
    model: "en_core_web_lg"
    case_sensitive: false
  - name: "SpacyTokenizer"
  - name: "SpacyFeaturizer"
  - name: "RegexFeaturizer"
  - name: "LexicalSyntacticFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: RegexEntityExtractor
    case_sensitive: false
    use_lookup_tables: true
    use_regexes: true
    use_word_boundaries: true
  - name: "DIETClassifier"
    epochs: 300
  - name: SpacyEntityExtractor
    dimensions: ["PERSON"]
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.1
  - name: "EntitySynonymMapper"

Any suggestions please…

Thanks!

-Arunabh

Update- I was able to fix the issue by combining the entire docker image string “rasa/rasa-sdk” in to a single entity and used a regex expression to match.

To make this work, I also had to change my config pipeline to (removed spacy tokenizer and featurizer, and added whitespacetokenizer):

pipeline:
  - name: SpacyNLP
    model: "en_core_web_lg"
    case_sensitive: false
  - name: "WhitespaceTokenizer"
  - name: "RegexFeaturizer"
  - name: "LexicalSyntacticFeaturizer"
  - name: "CountVectorsFeaturizer"
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: RegexEntityExtractor
    case_sensitive: false
    use_lookup_tables: false
    use_regexes: true
    use_word_boundaries: true
  - name: "DIETClassifier"
    epochs: 100
  - name: SpacyEntityExtractor
    dimensions: ["PERSON"]
  - name: FallbackClassifier
    threshold: 0.4
    ambiguity_threshold: 0.1
  - name: "EntitySynonymMapper"

nlu.yaml regex

- regex: img_name
  examples: |
    - ^(([a-z0-9._-]+)(\/([a-z0-9._-]+))?(:[a-z0-9._-]+)?\/)?([a-z0-9._-]+)$

Cheers!!!