Financial-demo bot following the course steps:

mean spacy

Yes, but not that much by experience (if you have good training data at least)

Can you show me your pipeline/config to make sure?

version: "2.0"
language: en_core_web_md
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.7
  - name: DucklingEntityExtractor
    url: http://duckling.rasa.com:8000
    dimensions:
    - amount-of-money
    - time
    - number
  - name: SpacyNLP
    model: "en_core_web_md"
    case_sensitive: false
  - name: "SpacyEntityExtractor"
    # Note: It is not possible to use the SpacyTokenizer + SpacyFeaturizer in
    #       combination with the WhitespaceTokenizer, and as a result the
    #       PERSON extraction by Spacy is not very robust.
    #       Because of this, the nlu training data is annotated as well, and the
    #       DIETClassifier will also extract PERSON entities .
    dimensions: ["PERSON"]
  - name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
  epochs: 40
- name: RulePolicy
  core_fallback_threshold: 0.4
  core_fallback_action_name: "action_default_fallback"
  enable_fallback_prediction: True

Maybe you can use this meanwhile.

version: "2.0"
language: en_core_web_md
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.7
  - name: DucklingEntityExtractor
    url: http://duckling.rasa.com:8000
    dimensions:
    - amount-of-money
    - time
    - number
  - name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
  epochs: 40
- name: RulePolicy
  core_fallback_threshold: 0.4
  core_fallback_action_name: "action_default_fallback"
  enable_fallback_prediction: True

I don’t know who from Rasa Team I should tag for this, I’ll try pinging @koaning since he answered a similar question.

Alternatively, not sure if it would work, but try keeping your original pipeline but only change line 2 from

language: en_core_web_md

to

language: en

it didnt work…

but with your’s scripts it trained. so i can continue with the course at least.

1 Like

The language: en configuration in pipeline.yml is mainly used in pipeline components to throw an error if a language is not supported. For example, if you indicated the pipeline is for a Chinese assistant then the WhitespaceTokenizer would throw an error because this form of tokenisation does not work for a language with no whitespace.

It’s good practice to list a modern iso language abbreviation there, but it doesn’t influence the pipeline beyond throwing errors and en_core_web_md is not an iso standard. Rather, it’s a name that spaCy uses to refer to it’s medium model.

1 Like

In general, I’d advice folks to install everything via;

python -m pip install "rasa[spacy]"
python -m spacy download en_core_web_md

By using python -m there’s less confusion between virtual environments. If you’re interested in learning more about this phenomenon you may appreciate the small course that I’ve made here.

1 Like

@Avi1, is your issue solved now?

Not Really, but U took your suggestion to apply the below and continue from there. so at least I could proceed with the course:

version: "2.0"
language: en_core_web_md
pipeline:
  - name: WhitespaceTokenizer
  - name: RegexFeaturizer
  - name: LexicalSyntacticFeaturizer
  - name: CountVectorsFeaturizer
  - name: CountVectorsFeaturizer
    analyzer: "char_wb"
    min_ngram: 1
    max_ngram: 4
  - name: DIETClassifier
    epochs: 100
  - name: FallbackClassifier
    threshold: 0.7
  - name: DucklingEntityExtractor
    url: http://duckling.rasa.com:8000
    dimensions:
    - amount-of-money
    - time
    - number
  - name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
  epochs: 40
- name: RulePolicy
  core_fallback_threshold: 0.4
  core_fallback_action_name: "action_default_fallback"
  enable_fallback_prediction: True
1 Like