I am facing issue while extracting entity which has same pattern

both sentence are similar except its beginning, how can I differentiate both and make entity extraction accurate.

training data



@akelad @JulianGerhard @Juste @juste_petr

Am also facing similiar issue can anyone help??? @akelad @JulianGerhard @Juste @shubham

hey @shubham, I can suggest you to try out Regex Feature for these problem, you can add the patterns for the above sentences:

it’s still not working for me, entity extraction is not accurate. can anyone help me with this ?

where the should do the changes in my training data or should I do something EntityExtractor?

@akelad @JulianGerhard @Juste

what does your nlu config look like?

language: en pipeline:

  • name: “SpacyNLP” model: “en_core_web_lg”

  • name: “SpacyTokenizer”

  • name: “SpacyFeaturizer”

  • name: “RegexFeaturizer”

  • name: “CRFEntityExtractor”

    features: [ [“low”, “title”, “upper”], [“bias”, “low”, “prefix5”, “prefix2”, “suffix5”,“digit”, “suffix3”,“suffix2”,“upper”, “title” ,“pattern”], [“low”, “title”, “upper”] ] BILOU_flag: true max_iterations: 50 L1_c: 0.1 L2_c: 0.1

  • name: “EntitySynonymMapper”

  • name: “CountVectorsFeaturizer” stop_words: [‘ourselves’, ‘hers’, ‘between’, ‘yourself’, ‘but’, ‘again’, ‘there’, ‘about’, ‘once’, ‘during’, ‘out’, ‘very’, ‘having’, ‘with’, ‘they’, ‘own’, ‘an’, ‘be’, ‘some’, ‘for’, ‘do’, ‘its’, ‘yours’, ‘such’, ‘into’, ‘of’, ‘most’, ‘itself’, ‘off’, ‘is’, ‘s’, ‘am’, ‘or’, ‘who’, ‘as’, ‘from’, ‘him’, ‘each’, ‘the’, ‘themselves’, ‘until’, ‘below’, ‘are’, ‘we’, ‘these’, ‘your’, ‘his’, ‘through’, ‘don’, ‘nor’, ‘me’, ‘were’, ‘her’, ‘more’, ‘himself’, ‘this’, ‘down’, ‘should’, ‘our’, ‘their’, ‘while’, ‘above’, ‘both’, ‘up’, ‘to’, ‘ours’, ‘had’, ‘she’, ‘all’, ‘no’, ‘when’, ‘at’, ‘any’, ‘before’, ‘them’, ‘same’, ‘and’, ‘been’, ‘have’, ‘in’, ‘will’, ‘on’, ‘does’, ‘yourselves’, ‘then’, ‘that’, ‘because’, ‘what’, ‘over’, ‘why’, ‘so’, ‘can’, ‘did’, ‘not’, ‘now’, ‘under’, ‘he’, ‘you’, ‘herself’, ‘has’, ‘just’, ‘where’, ‘too’, ‘only’, ‘myself’, ‘which’, ‘those’, ‘i’, ‘after’, ‘few’, ‘whom’, ‘t’, ‘being’, ‘if’, ‘theirs’, ‘my’, ‘against’, ‘a’, ‘by’, ‘doing’, ‘it’, ‘how’, ‘further’, ‘was’, ‘here’, ‘than’]

  • name: “EmbeddingIntentClassifier” intent_tokenization_flag: true intent_split_symbol: “+”


  • name: MemoizationPolicy
  • name: KerasPolicy
  • name: MappingPolicy

I suggest you put them both under one entity called “code” or something, and later differentiate them based on the output of the intent classification. Let me know if that makes sense for your use case.