Not getting entity extracted

I have trained the model on some data eg.
can I travel using a train
Can I travel using a bus

but when I asking the bot can I travel using a car; it’s not getting me a car extracted from it as an entity, I have to train explicitly on the car to get it.
Is there any workaround where I can get entity recognized even without specifically training on it.

I am even using spacy in my pipeline -

  • name: spellchecking.SpellCheckerEN
  • name: SpacyNLP
    model: “en_core_web_lg”
  • name: SpacyTokenizer
  • name: SpacyEntityExtractor
  • name: SpacyFeaturizer
    pooling: mean
  • name: CountVectorsFeaturizer
    analyzer: char_wb
    min_ngram: 1
    max_ngram: 2
  • name: CountVectorsFeaturizer
    analyzer: char
    min_ngram: 3
    max_ngram: 5
  • name: DIETClassifier
    epochs: 1

You can try around and kick out the CountVectorsFeaturizers from you pipeline. The the network will learn word representation based only on spacy and therefore the features for bus and car might be close enough. If you use char-based featurizers the AI will give to much credit to the actual letters of a word instead of the meaning given by Spacy. If that does not help you can try other dense word embeddings, still not using any countVectorsFeaturizers.

I would recommend using lookup tables. mode of transport is essentially a finite list of items anyway. i dont imagine training a model would do any justice

if you are keen on training a model consider some variations to your training data…

because if someone writes can i travel using a spaceship or a horse, it will be captured as well :smiley: just saying

1 Like