I am currently using pipeline something like this:
pipeline:
- name: HFTransformersNLP
model_name: "bert"
model_weights: "rasa/LaBSE"
cache_dir: /tmp
- name: LanguageModelFeaturizer
model_name: "bert"
model_weights: "rasa/LaBSE"
cache_dir: /tmp
alias: LMF
- name: "LanguageModelTokenizer"
"intent_tokenization_flag": False
"intent_split_symbol": "_"
- name: RegexFeaturizer
- name: CountVectorsFeaturizer
alias: CVF
analyzer: char_wb
min_ngram: 1
max_ngram: 4
"use_shared_vocab": True
- name: DIETClassifier
batch_strategy: balanced
intent_split_symbol: +
intent_tokenization_flag: True
epochs: 300
batch_size: 50
- name: CRFEntityExtractor
- name: EntitySynonymMapper
- name: ResponseSelector
featurizers: {CVF, LMF}
epochs: 300
retrieval_intent: faq
- name: ResponseSelector
featurizers: {CVF, LMF}
epochs: 300
retrieval_intent: chitchat
- name: FallbackClassifier
threshold: 0.4
ambiguity_threshold: 0.1
Is it mandatory to use:
model_weights: "rasa/LaBSE"
or can I cherry-pick from:
for example:
model_weights: "bert-large-uncased"
and which one would be better to use?
Thanks