Failed to load ABCMeta from model storage. Resource 'train_DIETClassifier5' doesn't exist

I have a rasa project deployed to k8s cluster. In one commit I added few training phrases + intents, changed the config.yaml and updated some custom actions. I run the bot locally and it worked as expected. But when deployed to the cluster the bot fails to predict intents.

The logs from the remote instance
2023-09-28 16:32:34 INFO     rasa.core.processor  - Loading model models/20230928-185010-spry-rubble.tar.gz...
2023-09-28 16:32:35 DEBUG    rasa.engine.storage.local_model_storage  - Extracted model to '/tmp/tmpa20a_qib'.
/opt/venv/lib/python3.10/site-packages/rasa/shared/core/slot_mappings.py:224: UserWarning: Slot auto-fill has been removed in 3.0 and replaced with a new explicit mechanism to set slots. Please refer to https://rasa.com/docs/rasa/domain#slots to learn more.
  rasa.shared.utils.io.raise_warning(
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'nlu_message_converter' loading 'NLUMessageConverter.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'run_WhitespaceTokenizer0' loading 'WhitespaceTokenizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'run_RegexFeaturizer1' loading 'RegexFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_RegexFeaturizer1' was requested for reading.
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'run_LexicalSyntacticFeaturizer2' loading 'LexicalSyntacticFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_LexicalSyntacticFeaturizer2' was requested for reading.
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'run_CountVectorsFeaturizer3' loading 'CountVectorsFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_CountVectorsFeaturizer3' was requested for reading.
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'run_CountVectorsFeaturizer4' loading 'CountVectorsFeaturizer.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_CountVectorsFeaturizer4' was requested for reading.
2023-09-28 16:32:35 DEBUG    rasa.engine.graph  - Node 'run_DIETClassifier5' loading 'DIETClassifier.load' and kwargs: '{}'.
2023-09-28 16:32:35 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_DIETClassifier5' was requested for reading.
2023-09-28 16:32:35 DEBUG    rasa.utils.tensorflow.models  - Loading the model from /tmp/tmpljuup8vr/train_DIETClassifier5/DIETClassifier.tf_model with finetune_mode=False...
2023-09-28 16:32:35 DEBUG    rasa.nlu.classifiers.diet_classifier  - Following metrics will be logged during training: 
2023-09-28 16:32:35 DEBUG    rasa.nlu.classifiers.diet_classifier  -   t_loss (total loss)
2023-09-28 16:32:35 DEBUG    rasa.nlu.classifiers.diet_classifier  -   i_acc (intent acc)
2023-09-28 16:32:35 DEBUG    rasa.nlu.classifiers.diet_classifier  -   i_loss (intent loss)
2023-09-28 16:32:35 DEBUG    rasa.nlu.classifiers.diet_classifier  -   e_f1 (entity f1)
2023-09-28 16:32:35 DEBUG    rasa.nlu.classifiers.diet_classifier  -   e_loss (entity loss)
/usr/lib/python3.10/random.py:370: DeprecationWarning: non-integer arguments to randrange() have been deprecated since Python 3.10 and will be removed in a subsequent version
  return self.randrange(a, b+1)
2023-09-28 16:33:06 DEBUG    rasa.nlu.classifiers.diet_classifier  - Failed to load ABCMeta from model storage. Resource 'train_DIETClassifier5' doesn't exist.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_EntitySynonymMapper6' loading 'EntitySynonymMapper.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_EntitySynonymMapper6' was requested for reading.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_ResponseSelector7' loading 'ResponseSelector.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_ResponseSelector7' was requested for reading.
2023-09-28 16:33:06 DEBUG    rasa.nlu.classifiers.diet_classifier  - Failed to load ABCMeta from model storage. Resource 'train_ResponseSelector7' doesn't exist.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_ResponseSelector7' was requested for reading.
2023-09-28 16:33:06 DEBUG    rasa.nlu.selectors.response_selector  - Failed to load ResponseSelector from model storage. Resource 'train_ResponseSelector7' doesn't exist.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_FallbackClassifier8' loading 'FallbackClassifier.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_RegexMessageHandler' loading 'RegexMessageHandler.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'domain_provider' loading 'DomainProvider.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'domain_provider' was requested for reading.
<frozen importlib._bootstrap>:283: DeprecationWarning: the load_module() method is deprecated and slated for removal in Python 3.12; use exec_module() instead
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_RulePolicy0' loading 'RulePolicy.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_RulePolicy0' was requested for reading.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_AugmentedMemoizationPolicy1' loading 'AugmentedMemoizationPolicy.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_AugmentedMemoizationPolicy1' was requested for reading.
2023-09-28 16:33:06 DEBUG    rasa.engine.graph  - Node 'run_TEDPolicy2' loading 'TEDPolicy.load' and kwargs: '{}'.
2023-09-28 16:33:06 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_TEDPolicy2' was requested for reading.
2023-09-28 16:33:06 DEBUG    rasa.utils.tensorflow.models  - Loading the model from /tmp/tmpljuup8vr/train_TEDPolicy2/ted_policy.tf_model with finetune_mode=False...
2023-09-28 16:33:19 DEBUG    rasa.core.policies.ted_policy  - Failed to load ABCMeta from model storage. Resource 'train_TEDPolicy2' doesn't exist.
2023-09-28 16:33:19 DEBUG    rasa.engine.graph  - Node 'rule_only_data_provider' loading 'RuleOnlyDataProvider.load' and kwargs: '{}'.
2023-09-28 16:33:19 DEBUG    rasa.engine.storage.local_model_storage  - Resource 'train_RulePolicy0' was requested for reading.
2023-09-28 16:33:19 DEBUG    rasa.engine.graph  - Node 'select_prediction' loading 'DefaultPolicyPredictionEnsemble.load' and kwargs: '{}'.
2023-09-28 16:33:19 INFO     root  - Rasa server is up and running.

I suppose the prediction fails due to

Failed to load ABCMeta from model storage. Resource 'train_DIETClassifier5' doesn't exist.

Or

Failed to load ABCMeta from model storage. Resource 'train_TEDPolicy2' doesn't exist.

When I run the bot locally there are no these messages and there’s a log ‘Finished loading the model.’.

Local versions:

  • Rasa Version : 3.6.10
  • Minimum Compatible Version: 3.5.0
  • Rasa SDK Version : 3.6.2
  • Python Version : 3.10.7
  • Operating System : macOS-12.2.1-arm64-arm-64bit

Remote versions:

  • Rasa Version : 3.6.10
  • Minimum Compatible Version: 3.5.0
  • Rasa SDK Version : 3.6.2
  • Python Version : 3.10.6
  • Operating System : Linux-5.4.0-148-generic-x86_64-with-glibc2.35

My latest config

- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
  alias: "cvf-word"
- name: DIETClassifier
  epochs: 100
  constrain_similarities: true
  model_confidence: softmax
- name: EntitySynonymMapper
- name: FallbackClassifier
  threshold: 0.3
  ambiguity_threshold: 0.02
policies:
- name: RulePolicy
  priority: 2
  enable_fallback_prediction: False
- name: AugmentedMemoizationPolicy
  priority: 3
  max_history: 10
- name: TEDPolicy
  priority: 1
  epochs: 100
  constrain_similarities: true
  nlu_fallback_threshold: 0.3
  core_fallback_threshold: 0.3
  core_fallback_action_name: action_default_fallback
  enable_fallback_prediction: true

My previous config

- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
  alias: "cvf-word"
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 100
  constrain_similarities: true
  model_confidence: softmax
- name: EntitySynonymMapper
- name: ResponseSelector
  featurizers: ["cvf-word"]
  epochs: 100
  constrain_similarities: true
  model_confidence: softmax
- name: FallbackClassifier
  threshold: 0.2
  ambiguity_threshold: 0.02
policies:
- name: RulePolicy
  enable_fallback_prediction: False
- name: AugmentedMemoizationPolicy
  max_history: 0
- name: TEDPolicy
  epochs: 100
  constrain_similarities: true
  nlu_fallback_threshold: 0.2
  core_fallback_threshold: 0.2
  core_fallback_action_name: action_default_fallback
  enable_fallback_prediction: true

And it’s not just that the bot works locally and doesn’t remotely. Another thing is that I tried to retrain model with the previous config but with the rest of the latest changes and it doesn’t work exactly the same as it doesn’t work with the latest config. Although when I deploy the previous version of the code all together - it works fine. That’s confusing, cause the action code changes shouldn’t affect the nlu part.

Any suggestions about where should I look for the possible reasons of this behaviour would be very much appreciated.

@nik202 Hello, I’m sorry to bother you, but maybe you could take a look

As a temporary workaround I added the step to the Dockerfile building in CI/CD pipeline and it started to work:

RUN rasa train --domain /app/domain.yml --data /app/data --out /app/models

Still not sure why it doesn’t work when the model is stored in files.

@xxxwarrior is it working using docker ?

Yes.

Dockerfile for rasa looks like this now, but it worked without the training step before. I trained the model locally.

FROM rasa/rasa:3.6.10

COPY . /app

USER root

RUN python -m pip install -r /app/requirements.txt

COPY entrypoint-rasa.sh /app/entrypoint-rasa.sh

RUN chmod +x /app/entrypoint-rasa.sh

RUN chown -R 1001:1001 /app

RUN rasa train --domain /app/domain.yml --data /app/data --out /app/models --debug

USER 1001

@xxxwarrior have you looked my thread for docker deployment?

@xxxwarrior try this thread and make the necessary changes as per the rasa version and let me know

1 Like

There’s a recently discovered issue training on arm under 2.6 and then using the model on amd64. This seems to have been introduced in 3.6.x and does not happen with 3.5.x

1 Like

@stephens I overlooked the fact that I did updated the local rasa version recently and that sounds like it could be the cause since the bot works as expected if the model is trained while building the image.

Thank you for mentioning that here, I’ll mark it as a solution.

@nik202 Your Dockerfile also contains the training step in it, which I thought could be avoided. However if @ stephens is right - I guess training during building is the easiest way to go. This is how things go from a temporary workaround to a somewhat permanent solution :grinning:

Anyway, thank you for your reply, have a good one, cheers

@xxxwarrior Then how you are thinking to train the model if you updated any code, or intents or examples?

Please enlighten me if you explored any other way around it will help the audience too.

Cheers!

I used to train the model locally and then use the same model remotely. Another way is to train the model, store it somewhere like s3 and use it on a deployed instance. Both ways you don’t have to train the model during the build.

@xxxwarrior thanks for the clarification Maria.

So for us as a hardcore developer it’s a temporary solution. In real dev environment we basically depends on CI/CD pipeline or we want that everything should be automatic. But in your case you manually training the model and then pushing it which is not an ideal scenario.

So if a Dev or test team (collaborative environment) changed anything in build it should automatically trained and deployed the build. When they push the code whilst using CI/CD pipeline depends what they are using such as GitHub actions.

Highly recommend to see rasa GitHub page and see workflow pipeline.

So the one I recommend in my Docker image is the ideal use-case adapted from the Rasa Workaround only. If you still have any doubt let me know. Once again thanks for the clarification response. Hope audience will learn from the conversation.

Thanks and cheers!

Nik

1 Like