I’m using the ConveRT Featurizer in my pipeline. How do I package the TF-Hub module from poly-ai during the docker build stage instead of downloading it every time during the runtime?
The following is the logs from the rasa run
2020-01-31 11:11:17 INFO absl - Using /var/folders/mq/dt_jljbj6r91d67j97mmjhth0000gn/T/tfhub_modules to cache modules.
2020-01-31 11:11:17 INFO absl - Downloading TF-Hub Module 'http://models.poly-ai.com/convert/v1/model.tar.gz'.
2020-01-31 11:11:33 INFO absl - Downloading http://models.poly-ai.com/convert/v1/model.tar.gz: 148.59MB
2020-01-31 11:11:33 INFO absl - Downloaded http://models.poly-ai.com/convert/v1/model.tar.gz, Total size: 152.02MB
2020-01-31 11:11:33 INFO absl - Downloaded TF-Hub Module 'http://models.poly-ai.com/convert/v1/model.tar.gz'.
Setting TFHUB_CACHE_DIR did update the path, but it didn’t stop the model from being downloaded again during the docker runtime. But what I did observe was that there was a new folder with a few files created in the docker container. I copied these over to my host and built the new docker image with these files and lo and behold, it worked like a charm!
Docker logs:
2020-02-03 02:29:00 INFO absl - Using models/tfhub to cache modules.
2020-02-03 02:29:04 DEBUG rasa.core.tracker_store - Connected to InMemoryTrackerStore.
2020-02-03 02:29:04 DEBUG rasa.core.lock_store - Connected to lock store 'InMemoryLockStore'.
2020-02-03 02:29:05 DEBUG rasa.model - Extracted model to '/tmp/tmp1wvz5fcv'.
2020-02-03 02:29:05 DEBUG pykwalify.compat - Using yaml library: /usr/local/lib/python3.7/site-packages/ruamel/yaml/__init__.py
2020-02-03 02:29:06 DEBUG rasa.core.nlg.generator - Instantiated NLG to 'TemplatedNaturalLanguageGenerator'.
@KarthiAru,
An alternative approach is to add this to the bottom of your Dockerfile:
USER 1001
#########################################################
# Preload CONVERT model #
# https://github.com/PolyAI-LDN/polyai-models#convert) #
# This way, it will not be downloaded within deployment #
#########################################################
ENV TFHUB_CACHE_DIR="/var/tmp/tfhub_modules"
RUN python -c "import rasa.utils.train_utils as train_utils; train_utils.load_tf_hub_model('http://models.poly-ai.com/convert/v1/model.tar.gz')"
RUN ls -l $TFHUB_CACHE_DIR
I am using a helper function from rasa to ‘pre-load’ the model. This will trigger a download into the docker image, and it will be all set up properly.
Now, when the rasa container starts up, it will not download the model again.