How to handle lower case names/entities in the message?

I have this input to the NLU - “get me all movies with tom hanks before 2020”

How do I configure Rasa or change the training data so that it extracts “tom hanks” as actor/name? As suggested in other threads/questions, I tried the case_sensitive: false option in the pipeline, but that doesn’t seem to change the behavior.

{
  "text": "get me all movies with tom hanks before 2020",
  "intent": {
    "name": "query_movies",
    "confidence": 1.0
  },
  "entities": [
    {
      "entity": "object_type",
      "start": 11,
      "end": 17,
      "confidence_entity": 0.9941898584365845,
      "value": "movie",
      "extractor": "DIETClassifier",
      "processors": [
        "EntitySynonymMapper"
      ]
    },
    {
      "entity": "actors",
      "start": 23,
      "end": 26,
      "confidence_entity": 0.4991215467453003,
      "value": "tom",
      "extractor": "DIETClassifier"
    },
    {
      "entity": "publication_year",
      "start": 33,
      "end": 44,
      "confidence_entity": 0.998313307762146,
      "role": "lt",
      "confidence_role": 0.3657113313674927,
      "value": "before 2020",
      "extractor": "DIETClassifier"
    }
  ],

Training samples:

      - show me [movies]{"entity": "object_type", "value": "movie"} with [ryan gosling](actors)
      - list [movies]{"entity": "object_type", "value": "movie"} with [emma stone](actors)

Pipeline

  - name: WhitespaceTokenizer
    case_sensitive: false
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
    constrain_similarities: true
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 100
    constrain_similarities: true
  - name: FallbackClassifier
    threshold: 0.3
    ambiguity_threshold: 0.1

There’s a good blog post on entity extraction here. I would try Spacy as recommended in the post.

1 Like