mean spacy
Yes, but not that much by experience (if you have good training data at least)
Can you show me your pipeline/config to make sure?
version: "2.0"
language: en_core_web_md
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: FallbackClassifier
threshold: 0.7
- name: DucklingEntityExtractor
url: http://duckling.rasa.com:8000
dimensions:
- amount-of-money
- time
- number
- name: SpacyNLP
model: "en_core_web_md"
case_sensitive: false
- name: "SpacyEntityExtractor"
# Note: It is not possible to use the SpacyTokenizer + SpacyFeaturizer in
# combination with the WhitespaceTokenizer, and as a result the
# PERSON extraction by Spacy is not very robust.
# Because of this, the nlu training data is annotated as well, and the
# DIETClassifier will also extract PERSON entities .
dimensions: ["PERSON"]
- name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
epochs: 40
- name: RulePolicy
core_fallback_threshold: 0.4
core_fallback_action_name: "action_default_fallback"
enable_fallback_prediction: True
Maybe you can use this meanwhile.
version: "2.0"
language: en_core_web_md
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: FallbackClassifier
threshold: 0.7
- name: DucklingEntityExtractor
url: http://duckling.rasa.com:8000
dimensions:
- amount-of-money
- time
- number
- name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
epochs: 40
- name: RulePolicy
core_fallback_threshold: 0.4
core_fallback_action_name: "action_default_fallback"
enable_fallback_prediction: True
I don’t know who from Rasa Team I should tag for this, I’ll try pinging @koaning since he answered a similar question.
Alternatively, not sure if it would work, but try keeping your original pipeline but only change line 2 from
language: en_core_web_md
to
language: en
it didnt work…
but with your’s scripts it trained. so i can continue with the course at least.
The language: en
configuration in pipeline.yml
is mainly used in pipeline components to throw an error if a language is not supported. For example, if you indicated the pipeline is for a Chinese assistant then the WhitespaceTokenizer would throw an error because this form of tokenisation does not work for a language with no whitespace.
It’s good practice to list a modern iso language abbreviation there, but it doesn’t influence the pipeline beyond throwing errors and en_core_web_md
is not an iso standard. Rather, it’s a name that spaCy uses to refer to it’s medium model.
In general, I’d advice folks to install everything via;
python -m pip install "rasa[spacy]"
python -m spacy download en_core_web_md
By using python -m
there’s less confusion between virtual environments. If you’re interested in learning more about this phenomenon you may appreciate the small course that I’ve made here.
@Avi1, is your issue solved now?
Not Really, but U took your suggestion to apply the below and continue from there. so at least I could proceed with the course:
version: "2.0"
language: en_core_web_md
pipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
analyzer: "char_wb"
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
- name: FallbackClassifier
threshold: 0.7
- name: DucklingEntityExtractor
url: http://duckling.rasa.com:8000
dimensions:
- amount-of-money
- time
- number
- name: EntitySynonymMapper
policies:
- name: AugmentedMemoizationPolicy
- name: TEDPolicy
epochs: 40
- name: RulePolicy
core_fallback_threshold: 0.4
core_fallback_action_name: "action_default_fallback"
enable_fallback_prediction: True