Training fails after extending Rasa Image

Hey guys,

Since I extended the rasa images for the production and worker, training and uploading models both fails with the following error code:
ERROR:rasax.community.api.blueprints.models:500, message='Internal Server Error', url=URL('http://rasa-worker:5005/model/train?token=rnyL9iBRVkzsW7f')

I extended the images with the following dockerfile:

# Extend the official Rasa SDK image
FROM rasa/rasa:latest-full

USER root

# Use subdirectory as working directory
WORKDIR /app

# Copy any additional custom requirements, if necessary (uncomment next line)
COPY requirements.txt ./

# Change back to root user to install dependencies
USER root

# Install extra requirements for actions code, if necessary (uncomment next line)
RUN pip install -r requirements.txt
RUN [ "python", "-c", "import nltk; nltk.download('vader_lexicon')" ]

# Copy sentiment analyzer to working directory
COPY sentiment_analyzer.py /app

#
CMD ["start","--actions","actions"]

# By best practices, don't run the code with root user
USER 1001

# add as environment variable
ENV PYTHONPATH=$PYTHONPATH:/app

The requirements are basically only the nltk module. This is the content of my sentiment analyzer:

from rasa.nlu.components import Component
from rasa.nlu import utils
from rasa.nlu.model import Metadata
from rasa.nlu.extractors.extractor import EntityExtractor
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import os

class SentimentAnalyzer(EntityExtractor):
    """A pre-trained sentiment component"""

    name = "sentiment"
    defaults = {}
    language_list = ["en"]

    def __init__(self, component_config=None):
        super(SentimentAnalyzer, self).__init__(component_config)

    def train(self, training_data, cfg, **kwargs):
        """Not needed, because the the model is pretrained"""
        pass



    def convert_to_rasa(self, value, confidence):
        """Convert model output into the Rasa NLU compatible output format."""
        
        entity = {"value": value,
                  "confidence": confidence,
                  "entity": "sentiment",
                  "extractor": "sentiment_extractor"}

        return entity


    def process(self, message, **kwargs):
        """Retrieve the text message, pass it to the classifier
            and append the prediction results to the message class."""

        sid = SentimentIntensityAnalyzer()
        res = sid.polarity_scores(message.text)
        key, value = max(res.items(), key=lambda x: x[1])

        entity = self.convert_to_rasa(key, value)

        message.set("entities", [entity], add_to_output=True)

    def persist(self,dir,anything):
        """Pass because a pre-trained model is already persisted"""

        pass

This is my config file:

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: "en"
pipeline:
  - name: HFTransformersNLP
    model_weights: "bert-base-uncased"
    model_name: "bert"
  - name: LanguageModelTokenizer
    intent_tokenization_flag: True
    intent_split_symbol: "+"
  - name: sentiment_analyzer.SentimentAnalyzer
  - name: LanguageModelFeaturizer
  - name: RegexFeaturizer
  - name: DIETClassifier
    epochs: 150
  - name: EntitySynonymMapper
  - name: ResponseSelector
    epochs: 150

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: TEDPolicy
    max_history: 5
    epochs: 100
  - name: MappingPolicy
  - name: FormPolicy

Since I get an empty error log from the worker container, I have no idea where to look. The server is deployed on a google VM with Ubuntu. Versions: Rasa = 1.10.10, Rasa-X = 0.31.5, Python = 3.7.8

Does anyone have an idea where to look for an error? And why is the error log empty anyway? I would really appreciate the help, as I have limited time to make this work…

The Dockerfile above is configured to run an action server not a rasa-worker or rasa-production so you cannot use this for training.

What changes do you want to make to the worker & production containers?

Since I added the sentiment analyzer, the worker needs to have the nltk module and the vader library accessible. Otherwise I will get an error every time I try to train or load a model. How do I configure the dockerfile in order to run a rasa-worker/production instead?

Although the start command in your dockerfile is incorrect, if you are running it with the rasa x docker-compose setup, it should be overridden correctly by the command provided, so I’m not sure that is the issue.

The empty worker logs are definitely not helpful! Can you try adding --debug to the startup command to see if you can see more logs then?

That would be the one here:

  command: >
    x
    --no-prompt
    --production
    --config-endpoint http://rasa-x:5002/api/config?token=${RASA_X_TOKEN}
    --port 5005
    --jwt-method HS256
    --jwt-secret ${JWT_SECRET}
    --auth-token '${RASA_TOKEN}'
    --cors "*"
1 Like

I added this to my docker-compose.override.yml

  rasa-production:
    command: >
      x
      --no-prompt
      --production
      --config-endpoint http://rasa-x:5002/api/config?token=${RASA_X_TOKEN}
      --port 5005
      --jwt-method HS256
      --jwt-secret ${JWT_SECRET}
      --auth-token '${RASA_TOKEN}'
      --cors "*"
      --debug
  rasa-worker:
    command: >
      x
      --no-prompt
      --production
      --config-endpoint http://rasa-x:5002/api/config?token=${RASA_X_TOKEN}
      --port 5005
      --jwt-method HS256
      --jwt-secret ${JWT_SECRET}
      --auth-token '${RASA_TOKEN}'
      --cors "*"
      --debug

Then I ran docker-compose up -d and tried to train a model. This is the error log of the worker:

2020-08-17 08:25:47 DEBUG    sanic_jwt.configuration  - validating provided secret
2020-08-17 08:25:47 DEBUG    sanic_jwt.configuration  - validating keys (if needed)
2020-08-17 08:25:47 DEBUG    sanic_jwt.configuration  - loading secret and/or keys (if needed)
2020-08-17 08:25:47 DEBUG    rasa.core.utils  - Available web server routes:
> /conversations/<conversation_id>/messages          POST                           add_message
> /conversations/<conversation_id>/tracker/events    POST                           append_events
> /auth                                              POST                           auth_bp.AuthenticateEndpoint
> /auth/me                                           GET                            auth_bp.RetrieveUserEndpoint
> /auth/verify                                       GET                            auth_bp.VerifyEndpoint
> /webhooks/rasa                                     GET                            custom_webhook_RasaChatInput.health
> /webhooks/rasa/webhook                             POST                           custom_webhook_RasaChatInput.receive
> /model/test/intents                                POST                           evaluate_intents
> /model/test/stories                                POST                           evaluate_stories
> /conversations/<conversation_id>/execute           POST                           execute_action
> /domain                                            GET                            get_domain
> /                                                  GET                            hello
> /model                                             PUT                            load_model
> /model/parse                                       POST                           parse
> /conversations/<conversation_id>/predict           POST                           predict
> /conversations/<conversation_id>/tracker/events    PUT                            replace_events
> /conversations/<conversation_id>/story             GET                            retrieve_story
> /conversations/<conversation_id>/tracker           GET                            retrieve_tracker
> /status                                            GET                            status
> /model/predict                                     POST                           tracker_predict
> /model/train                                       POST                           train
> /conversations/<conversation_id>/trigger_intent    POST                           trigger_intent
> /model                                             DELETE                         unload_model
> /version                                           GET                            version
> 2020-08-17 08:25:47 INFO     root  - Starting Rasa server on http://localhost:5005
> 2020-08-17 08:25:47 DEBUG    rasa.core.utils  - Using the default number of Sanic workers (1).
> 2020-08-17 08:25:47 INFO     root  - Enabling coroutine debugging. Loop id 94122030671584.
> 2020-08-17 08:25:47 DEBUG    root  - Could not load interpreter from 'None'.
> /opt/venv/lib/python3.7/site-packages/rasa/core/brokers/pika.py:346: FutureWarning: Your Pika event broker config contains the deprecated `queue` key. Please use the `queues` key instead.
>   docs=DOCS_URL_PIKA_EVENT_BROKER,
> 2020-08-17 08:25:47 DEBUG    rasa.core.brokers.broker  - Instantiated event broker to 'PikaEventBroker'.
> 2020-08-17 08:25:47 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59390)>; error=111 (Connection refused)
> 2020-08-17 08:25:47 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
> 2020-08-17 08:25:47 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
> 2020-08-17 08:25:47 DEBUG    rasa.core.tracker_store  - Attempting to connect to database via 'postgresql://admin:***@db:5432/rasa'.
> 2020-08-17 08:25:47 DEBUG    rasa.core.tracker_store  - Connection to SQL database 'worker_tracker' successful.
> 2020-08-17 08:25:47 DEBUG    rasa.core.tracker_store  - Connected to SQLTrackerStore.
> 2020-08-17 08:25:47 DEBUG    rasa.core.lock_store  - Connected to lock store 'RedisLockStore'.
> 2020-08-17 08:25:47 DEBUG    rasa.core.nlg.generator  - Instantiated NLG to 'TemplatedNaturalLanguageGenerator'.
> 2020-08-17 08:25:47 DEBUG    rasa.core.agent  - Requesting model from server http://rasa-x:5002/api/projects/default/models/tags/production...
> 2020-08-17 08:25:48 DEBUG    rasa.core.agent  - Model server could not find a model at the requested endpoint 'http://rasa-x:5002/api/projects/default/models/tags/production'. It's possible that no model has been trained, or that the requested tag hasn't been assigned.
> 2020-08-17 08:25:48 DEBUG    rasa.core.agent  - No new model found at URL http://rasa-x:5002/api/projects/default/models/tags/production
> 2020-08-17 08:25:48 INFO     root  - Rasa server is up and running.
> 2020-08-17 08:25:52 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59414)>; error=111 (Connection refused)
> 2020-08-17 08:25:52 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
> 2020-08-17 08:25:52 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
> 2020-08-17 08:25:57 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59498)>; error=111 (Connection refused)
> 2020-08-17 08:25:57 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
> 2020-08-17 08:25:57 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
> 2020-08-17 08:25:58 DEBUG    rasa.core.agent  - Requesting model from server http://rasa-x:5002/api/projects/default/models/tags/production...
> 2020-08-17 08:26:02 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=26, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59512)>; error=111 (Connection refused)
> 2020-08-17 08:26:02 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
> 2020-08-17 08:26:02 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
> 2020-08-17 08:26:02 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/config.yml' is 'unk'.
> 2020-08-17 08:26:02 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/domain.yml' is 'unk'.
> 2020-08-17 08:26:02 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/nlu.md' is 'md'.
> 2020-08-17 08:26:02 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/responses.md' is 'unk'.
> 2020-08-17 08:26:02 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/stories.md' is 'unk'.
> 2020-08-17 08:26:02 DEBUG    pykwalify.compat  - Using yaml library: /opt/venv/lib/python3.7/site-packages/ruamel/yaml/__init__.py
> /opt/venv/lib/python3.7/site-packages/rasa/utils/common.py:384: UserWarning: Training data file /tmp/tmpnj8gazhv/domain.yml doesn't have a 'version' key. Rasa Open Source will read the file as a version '2.0' file.
>   More info at https://rasa.com/docs/rasa
> 2020-08-17 08:26:03 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/nlu.md' is 'md'.
> 2020-08-17 08:26:06 DEBUG    rasa.nlu.training_data.loading  - Training data format of '/tmp/tmpnj8gazhv/nlu.md' is 'md'.
> /opt/venv/lib/python3.7/site-packages/tensorflow_addons/utils/ensure_tf_install.py:68: UserWarning: Tensorflow Addons supports using Python ops for all Tensorflow versions above or equal to 2.2.0 and strictly below 2.3.0 (nightly versions are not supported).
>  The versions of TensorFlow you are currently using is 2.3.0 and is not supported.
> Some things might work, some things might not.
> If you were to encounter a bug, do not file an issue.
> If you want to make sure you're using a tested and supported configuration, either change the TensorFlow version or the TensorFlow Addons's version.
> You can find the compatibility matrix in TensorFlow Addon's readme:
> https://github.com/tensorflow/addons
>   UserWarning,
> 2020-08-17 08:26:07 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=31, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59520)>; error=111 (Connection refused)
> 2020-08-17 08:26:07 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
> 2020-08-17 08:26:07 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
> 2020-08-17 08:26:09 INFO     transformers.file_utils  - TensorFlow version 2.3.0 available.
> 2020-08-17 08:26:11 DEBUG    rasa.nlu.utils.hugging_face.hf_transformers  - Loading Tokenizer and Model for bert
> 2020-08-17 08:26:11 DEBUG    rasa.server  - Traceback (most recent call last):
>   File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 987, in _from_pretrained
>     local_files_only=local_files_only,
>   File "/opt/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 260, in cached_path
>     local_files_only=local_files_only,
>   File "/opt/venv/lib/python3.7/site-packages/transformers/file_utils.py", line 362, in get_from_cache
>     os.makedirs(cache_dir, exist_ok=True)
>   File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
>     makedirs(head, exist_ok=exist_ok)
>   File "/usr/local/lib/python3.7/os.py", line 213, in makedirs
>     makedirs(head, exist_ok=exist_ok)
>   File "/usr/local/lib/python3.7/os.py", line 223, in makedirs
>     mkdir(name, mode)
> PermissionError: [Errno 13] Permission denied: '/.cache'


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/venv/lib/python3.7/site-packages/rasa/server.py", line 810, in train
    None, functools.partial(train_model, **info)
  File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 58, in train
    nlu_additional_arguments=nlu_additional_arguments,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 114, in train_async
    nlu_additional_arguments=nlu_additional_arguments,
  File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 207, in _train_async_internal
    old_model_zip_path=old_model,
  File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 246, in _do_training
    additional_arguments=nlu_additional_arguments,
  File "/opt/venv/lib/python3.7/site-packages/rasa/train.py", line 541, in _train_nlu_with_validated_data
    **additional_arguments,
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/train.py", line 75, in train
    trainer = Trainer(nlu_config, component_builder)
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/model.py", line 146, in __init__
    self.pipeline = self._build_pipeline(cfg, component_builder)
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/model.py", line 158, in _build_pipeline
    component = component_builder.create_component(component_cfg, cfg)
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/components.py", line 786, in create_component
    component = registry.create_component_by_config(component_config, cfg)
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/registry.py", line 163, in create_component_by_config
    return component_class.create(component_config, config)
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/components.py", line 491, in create
    return cls(component_config)
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 50, in __init__
    self._load_model()
  File "/opt/venv/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 84, in _load_model
    self.model_weights, cache_dir=self.cache_dir
  File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 911, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/opt/venv/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 1004, in _from_pretrained
    raise EnvironmentError(msg)
OSError: Couldn't reach server at '{}' to download vocabulary files.

2020-08-17 08:26:12 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59526)>; error=111 (Connection refused)
2020-08-17 08:26:12 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
2020-08-17 08:26:12 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')
2020-08-17 08:26:17 ERROR    pika.adapters.utils.io_services_utils  - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.48.7', 59532)>; error=111 (Connection refused)
2020-08-17 08:26:17 ERROR    pika.adapters.utils.connection_workflow  - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('192.168.48.4', 5672))
2020-08-17 08:26:17 ERROR    pika.adapters.utils.connection_workflow  - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')

Does this tell you anything @erohmensing?

@erohmensing I investigated the PermissionError and the OSError. They seem to stem from the BERT-Transformer, somehow the server cannot be reached?
I tried training a model without the BERT-transformer and it worked. I need it however for my research project.

This thread Training fails when using HFTransformers contains the solution for the BERT-transformer. I had to add cache_dir: /tmp to the pipeline component.

However after training, I’ve got this error in my production container log :

**********************************************************************
  Resource vader_lexicon not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('vader_lexicon')

  For more information see: https://www.nltk.org/data.html

  Attempted to load sentiment/vader_lexicon.zip/vader_lexicon/vader_lexicon.txt

  Searched in:
    - '/nltk_data'
    - '/opt/venv/nltk_data'
    - '/opt/venv/share/nltk_data'
    - '/opt/venv/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

Did I install the lexicon incorrectly or in the wrong location?

Glad to see you managed to figure out the HFTransformers cache issue. As for the nltk stuff, I’m not sure, as its a custom component.

What i’d recommend is execing into the container and looking for the file:

find / -type f -name "vader_lexicon.zip"

Any files that your 1001 user has access to should show up there. If its a permission issue, not a location issue, permissions may need to be updated before switching back to 1001: rasa/Dockerfile_full at master · RasaHQ/rasa · GitHub

1 Like

@erohmensing Thanks so much for your help! I actually solved it.

The library was saved to the root folder, but to save it centrally, so that the script would find it, I needed to change the dockerfile command to :
RUN python -m nltk.downloader -d /usr/local/share/nltk_data vader_lexicon

Now my training runs and I can talk to the model!

Awesome. Glad you’re up and running! :rocket: