I have a rasa project deployed to k8s cluster. In one commit I added few training phrases + intents, changed the config.yaml and updated some custom actions. I run the bot locally and it worked as expected. But when deployed to the cluster the bot fails to predict intents.
The logs from the remote instance
2023-09-28 16:32:34 INFO rasa.core.processor - Loading model models/20230928-185010-spry-rubble.tar.gz...
2023-09-28 16:32:35 DEBUG rasa.engine.storage.local_model_storage - Extracted model to '/tmp/tmpa20a_qib'.
/opt/venv/lib/python3.10/site-packages/rasa/shared/core/slot_mappings.py:224: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more.
rasa.shared.utils.io.raise_warning(
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'nlu_message_converter' loading 'NLUMessageConverter.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'run_WhitespaceTokenizer0' loading 'WhitespaceTokenizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'run_RegexFeaturizer1' loading 'RegexFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_RegexFeaturizer1' was requested for reading.
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'run_LexicalSyntacticFeaturizer2' loading 'LexicalSyntacticFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_LexicalSyntacticFeaturizer2' was requested for reading.
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'run_CountVectorsFeaturizer3' loading 'CountVectorsFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_CountVectorsFeaturizer3' was requested for reading.
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'run_CountVectorsFeaturizer4' loading 'CountVectorsFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_CountVectorsFeaturizer4' was requested for reading.
2023-09-28 16:32:35 DEBUG rasa.engine.graph - Node 'run_DIETClassifier5' loading 'DIETClassifier.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_DIETClassifier5' was requested for reading.
2023-09-28 16:32:35 DEBUG rasa.utils.tensorflow.models - Loading the model from /tmp/tmpljuup8vr/train_DIETClassifier5/DIETClassifier.tf_model with finetune_mode=False...
2023-09-28 16:32:35 DEBUG rasa.nlu.classifiers.diet_classifier - Following metrics will be logged during training:
2023-09-28 16:32:35 DEBUG rasa.nlu.classifiers.diet_classifier - t_loss (total loss)
2023-09-28 16:32:35 DEBUG rasa.nlu.classifiers.diet_classifier - i_acc (intent acc)
2023-09-28 16:32:35 DEBUG rasa.nlu.classifiers.diet_classifier - i_loss (intent loss)
2023-09-28 16:32:35 DEBUG rasa.nlu.classifiers.diet_classifier - e_f1 (entity f1)
2023-09-28 16:32:35 DEBUG rasa.nlu.classifiers.diet_classifier - e_loss (entity loss)
/usr/lib/python3.10/random.py:370: DeprecationWarning: non-integer arguments to randrange() have been deprecated since Python 3.10 and will be removed in a subsequent version
return self.randrange(a, b+1)
2023-09-28 16:33:06 DEBUG rasa.nlu.classifiers.diet_classifier - Failed to load ABCMeta from model storage. Resource 'train_DIETClassifier5' doesn't exist.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_EntitySynonymMapper6' loading 'EntitySynonymMapper.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_EntitySynonymMapper6' was requested for reading.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_ResponseSelector7' loading 'ResponseSelector.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_ResponseSelector7' was requested for reading.
2023-09-28 16:33:06 DEBUG rasa.nlu.classifiers.diet_classifier - Failed to load ABCMeta from model storage. Resource 'train_ResponseSelector7' doesn't exist.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_ResponseSelector7' was requested for reading.
2023-09-28 16:33:06 DEBUG rasa.nlu.selectors.response_selector - Failed to load ResponseSelector from model storage. Resource 'train_ResponseSelector7' doesn't exist.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_FallbackClassifier8' loading 'FallbackClassifier.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_RegexMessageHandler' loading 'RegexMessageHandler.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'domain_provider' loading 'DomainProvider.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'domain_provider' was requested for reading.
<frozen importlib._bootstrap>:283: DeprecationWarning: the load_module() method is deprecated and slated for removal in Python 3.12; use exec_module() instead
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_RulePolicy0' loading 'RulePolicy.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_RulePolicy0' was requested for reading.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_AugmentedMemoizationPolicy1' loading 'AugmentedMemoizationPolicy.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_AugmentedMemoizationPolicy1' was requested for reading.
2023-09-28 16:33:06 DEBUG rasa.engine.graph - Node 'run_TEDPolicy2' loading 'TEDPolicy.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_TEDPolicy2' was requested for reading.
2023-09-28 16:33:06 DEBUG rasa.utils.tensorflow.models - Loading the model from /tmp/tmpljuup8vr/train_TEDPolicy2/ted_policy.tf_model with finetune_mode=False...
2023-09-28 16:33:19 DEBUG rasa.core.policies.ted_policy - Failed to load ABCMeta from model storage. Resource 'train_TEDPolicy2' doesn't exist.
2023-09-28 16:33:19 DEBUG rasa.engine.graph - Node 'rule_only_data_provider' loading 'RuleOnlyDataProvider.load' and kwargs: '{}'.
2023-09-28 16:33:19 DEBUG rasa.engine.storage.local_model_storage - Resource 'train_RulePolicy0' was requested for reading.
2023-09-28 16:33:19 DEBUG rasa.engine.graph - Node 'select_prediction' loading 'DefaultPolicyPredictionEnsemble.load' and kwargs: '{}'.
2023-09-28 16:33:19 INFO root - Rasa server is up and running.
I suppose the prediction fails due to
Failed to load ABCMeta from model storage. Resource 'train_DIETClassifier5' doesn't exist.
Or
Failed to load ABCMeta from model storage. Resource 'train_TEDPolicy2' doesn't exist.
When I run the bot locally there are no these messages and there’s a log ‘Finished loading the model.’
.
Local versions:
- Rasa Version : 3.6.10
- Minimum Compatible Version: 3.5.0
- Rasa SDK Version : 3.6.2
- Python Version : 3.10.7
- Operating System : macOS-12.2.1-arm64-arm-64bit
Remote versions:
- Rasa Version : 3.6.10
- Minimum Compatible Version: 3.5.0
- Rasa SDK Version : 3.6.2
- Python Version : 3.10.6
- Operating System : Linux-5.4.0-148-generic-x86_64-with-glibc2.35
My latest config
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
alias: "cvf-word"
- name: DIETClassifier
epochs: 100
constrain_similarities: true
model_confidence: softmax
- name: EntitySynonymMapper
- name: FallbackClassifier
threshold: 0.3
ambiguity_threshold: 0.02
policies:
- name: RulePolicy
priority: 2
enable_fallback_prediction: False
- name: AugmentedMemoizationPolicy
priority: 3
max_history: 10
- name: TEDPolicy
priority: 1
epochs: 100
constrain_similarities: true
nlu_fallback_threshold: 0.3
core_fallback_threshold: 0.3
core_fallback_action_name: action_default_fallback
enable_fallback_prediction: true
My previous config
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
alias: "cvf-word"
- name: CountVectorsFeaturizer
analyzer: char_wb
min_ngram: 1
max_ngram: 4
- name: DIETClassifier
epochs: 100
constrain_similarities: true
model_confidence: softmax
- name: EntitySynonymMapper
- name: ResponseSelector
featurizers: ["cvf-word"]
epochs: 100
constrain_similarities: true
model_confidence: softmax
- name: FallbackClassifier
threshold: 0.2
ambiguity_threshold: 0.02
policies:
- name: RulePolicy
enable_fallback_prediction: False
- name: AugmentedMemoizationPolicy
max_history: 0
- name: TEDPolicy
epochs: 100
constrain_similarities: true
nlu_fallback_threshold: 0.2
core_fallback_threshold: 0.2
core_fallback_action_name: action_default_fallback
enable_fallback_prediction: true
And it’s not just that the bot works locally and doesn’t remotely. Another thing is that I tried to retrain model with the previous config but with the rest of the latest changes and it doesn’t work exactly the same as it doesn’t work with the latest config. Although when I deploy the previous version of the code all together - it works fine. That’s confusing, cause the action code changes shouldn’t affect the nlu part.
Any suggestions about where should I look for the possible reasons of this behaviour would be very much appreciated.