Config for Spanish Bot

I am creating my bot for a project, but I am having problems with the training result. Is this the best configuration I can use for Spanish?

pipeline:
  - name: "SpacyNLP"
    model: "es_core_news_sm"
    case_sensitive: False
  - name: SpacyTokenizer
  - name: SpacyFeaturizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 200

policies:
  - name: RulePolicy
  - name: AugmentedMemoizationPolicy
    max_history: 6
  - name: TEDPolicy
    max_history: 10
    epochs: 20
    batch_size:
      - 32
      - 64
    constrain_similarities: true

Hmm, have you tried the default config for your pipeline? As written it should work for most white-space separated languages (including Spanish), especally if you have a fair amount of training data.

If you’re not working in a news domain (or with other fairly formal written text) and aren’t getting the results you want, you might consider also investing in training a custom SpaCy model.

(Sorry for the kinda vague answer, but “what’s the best pipeline for a specific chatbot” will probably require a bit of guessing & testing.)

1 Like