How to use SpacyEntityExtractor

Hi! So I looked around on the forums and couldn’t seem to find anything about how to use the SpacyEntityExtractor. I wanted to use it get identify names and dates but am not entirely sure how to use it. I read the documentation and I saw how to add the extractor and what the extractor does. However, I’m not too sure how to use the extractor in my chatbot.

This is what my config pipeline looks like.

language: en
pipeline:
- name: "SpacyNLP"
- name: "SpacyTokenizer"
- name: "SpacyFeaturizer"
- name: "RegexFeaturizer"
- name: "CRFEntityExtractor"
- name: "EntitySynonymMapper"
- name: "SklearnIntentClassifier"
- name: "SpacyEntityExtractor"

Please let me know if you need more information.

Hi, SpacyEntityExtractor ( Components ) is relying on the BILOU model in which the extracted entity is pretrained in the model.

And looking at your pipeline stack, it is just aggregated. usually the Input text processing goes as : SpacyNLP > parse by Spacy Tokenizer > Entity labeling by Regex > CRF Entity and then classifiers. So your order could be:

  • name: “SpacyNLP”
  • name: “SpacyTokenizer”
  • name: “SpacyFeaturizer”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor” or “SpacyEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “SklearnIntentClassifier”

Thanks for this information. I was just wondering how I can get the extracted values from spacy in actions.py. I tried tracker.get_slot(“PERSON”). However, that did not work.

Have you defined the slot named “PERSON”? What does print(tracker.slots) outputs? If user input is PERSON’s name then SpacyEntityExtractor will extract as entity == PERSON. Say user input (last message) is Alexander. then in action.py, tracker.latest_message.get("entities") will return [{'entity': 'ORG', 'value': 'Alexander', 'start': 0, 'confidence': None, 'end': 9, 'extractor': 'SpacyEntityExtractor'}]

1 Like

Hey I am trying to train rasa to recognise any name. I have made the changes in pipeline as language: “en”

pipeline:

  • name: “WhitespaceTokenizer”

  • name: “RegexFeaturizer”

  • name: “CRFEntityExtractor”

  • name: “EntitySynonymMapper”

  • name: “CountVectorsFeaturizer”

  • name: “EmbeddingIntentClassifier”

  • name: “SpacyNLP”

    model: “en_core_web_md”

  • name: “SpacyEntityExtractor”

    dimensions: [“PERSON”]]

Is this enough for the bot to recognise names or do I need to make changes in other files also