Entity Extractor alternatives

CRFEntityExtractor Component is increasing the memory and training time of system as training size of nlu increases. Is there any alternative to this extractor that works efficiently for entity extraction and consumes less time and memory as well. Main concern is memory here as it blocks the whole system…

we’re working on a new architecture. But it is a general rule, the more data you have, the more memory you need

what would be different in this new architecture. would it solve the problem of excessive memory and time consumption

what do you mean “excessive”? how much memory it is using already for how much data?

for nlu size of 15-20 MB , it took above 8GB of server’s memory (server’s ram is 8gb) @Ghostvv

what is your pipeline?

it is hard to understand in mb, how many examples do you have?


  • name: “SpacyNLP”
  • name: “WhitespaceTokenizer”
  • name: “SpacyFeaturizer”
  • name: “RegexFeaturizer”
  • name: “EntitySynonymMapper”
  • name: “SklearnIntentClassifier”
  • name: “CRFEntityExtractor”
  • name: “DucklingHTTPExtractor” url: “http://localhost:8000” dimensions: [“time”,“number”,“distance”,“email” , “amount-of-money”] locale: “en_GB” timezone: “Europe/London” policies:
    • name: MemoizationPolicy
    • name: KerasPolicy
    • name: MappingPolicy
    • name: “FallbackPolicy” nlu_threshold: 0.3 fallback_action_name: “utter_default_fallback”

This is the pipeline we are using. @Ghostvv

How many intent examples do you have?

@Ghostvv we have intent examples approx 50-60 thousands. actually our examples are formed dynamically following the permutations and combinations of entities and their synonyms values(dynamic)

that’s a lot. You can implement an online loader, in order to reduce memory consumption