spaCy pretrained models break chatbot NLU capacities


Setting up the pipelines to use pretrained_embeddings_spacy whatever the language setup breaks the NLU abilities of our bot.

config.yml file content:

language: fr
pipeline: pretrained_embeddings_spacy

  - name: MemoizationPolicy
  - name: KerasPolicy
  - name: MappingPolicy
  - name: FormPolicy
  - name: "FallbackPolicy"
    nlu_threshold: 0.7 # Min confidence needed to accept an NLU prediction
    core_threshold: 0.5 # Min confidence needed to accept an action prediction from Rasa Core
    fallback_action_name: "action_incompréhension"

Is there anything else to setup ?

Thanks for your help.


How much training data example do you have?

Less than 1000 as I understand.

rasa.nlu.training_data.training_data  - Training data stats:
        - intent examples: 214 (10 distinct intents)
        - Number of response examples: 0 (0 distinct response)
        - entity examples: 0 (0 distinct entities)

I would still try out the supervised_embeddings pipeline and/ or stay with the spacy pipeline but change single components.

Indeed, supervised_embeddings yield satisfactory results, I was just wondering about spaCy pretrained model impacts on overall accuracy & performance.

However, what are “single components” exactly ? Our proprietary content in file ?

No. For example supervised_embeddings is the same as pipeline:

  • name: “WhitespaceTokenizer”
  • name: “RegexFeaturizer”
  • name: “CRFEntityExtractor”
  • name: “EntitySynonymMapper”
  • name: “CountVectorsFeaturizer”
  • name: “CountVectorsFeaturizer” analyzer: “char_wb” min_ngram: 1 max_ngram: 4
  • name: “EmbeddingIntentClassifier”

Every part is a single component of the NLU pipeline and influence the NLU result. And you can play around with the single components to achieve better results.

1 Like

Ah alright, I’ll take a look. Thank you very much for your help