Caching TF-Hub modules

KarthiAru · January 31, 2020, 5:45am

Hi,

I’m using the ConveRT Featurizer in my pipeline. How do I package the TF-Hub module from poly-ai during the docker build stage instead of downloading it every time during the runtime?

The following is the logs from the rasa run

2020-01-31 11:11:17 INFO     absl  - Using /var/folders/mq/dt_jljbj6r91d67j97mmjhth0000gn/T/tfhub_modules to cache modules.
2020-01-31 11:11:17 INFO     absl  - Downloading TF-Hub Module 'http://models.poly-ai.com/convert/v1/model.tar.gz'.
2020-01-31 11:11:33 INFO     absl  - Downloading http://models.poly-ai.com/convert/v1/model.tar.gz: 148.59MB
2020-01-31 11:11:33 INFO     absl  - Downloaded http://models.poly-ai.com/convert/v1/model.tar.gz, Total size: 152.02MB
2020-01-31 11:11:33 INFO     absl  - Downloaded TF-Hub Module 'http://models.poly-ai.com/convert/v1/model.tar.gz'.

dakshvar22 · January 31, 2020, 3:47pm

I haven’t tested this myself but you can try. You will need some custom steps inside the Dockerfile -

Download ConveRT model http://models.poly-ai.com/convert/v1/model.tar.gz and place in a directory.
Set an environment variable TFHUB_CACHE_DIR to the directory where you downloaded 1.

At runtime it should be able to load the already downloaded model. Let me know if this works.

KarthiAru · February 3, 2020, 2:21am

Setting TFHUB_CACHE_DIR did update the path, but it didn’t stop the model from being downloaded again during the docker runtime. But what I did observe was that there was a new folder with a few files created in the docker container. I copied these over to my host and built the new docker image with these files and lo and behold, it worked like a charm!

Docker logs:

2020-02-03 02:29:00 INFO     absl  - Using models/tfhub to cache modules.
2020-02-03 02:29:04 DEBUG    rasa.core.tracker_store  - Connected to InMemoryTrackerStore.
2020-02-03 02:29:04 DEBUG    rasa.core.lock_store  - Connected to lock store 'InMemoryLockStore'.
2020-02-03 02:29:05 DEBUG    rasa.model  - Extracted model to '/tmp/tmp1wvz5fcv'.
2020-02-03 02:29:05 DEBUG    pykwalify.compat  - Using yaml library: /usr/local/lib/python3.7/site-packages/ruamel/yaml/__init__.py
2020-02-03 02:29:06 DEBUG    rasa.core.nlg.generator  - Instantiated NLG to 'TemplatedNaturalLanguageGenerator'.

Details of the files:

models
└──tfhub
   │ model.tar.gz
   │ 61ee56c901ee8aa67fa63e1152683dfe55693b04.descriptor.txt
   │
   └─61ee56c901ee8aa67fa63e1152683dfe55693b04
      │ assets
      │ saved_model.pb
      │ tfhub_module.pb
      │ variables
	     │ variables.data-00000-of-00002
	     │ variables.data-00001-of-00002
	     │ variables.index

I found a post on medium to do this via code How to run TF hub locally without internet connection

Arjaan · September 2, 2020, 12:38pm

@KarthiAru, An alternative approach is to add this to the bottom of your Dockerfile:

USER 1001

#########################################################
# Preload CONVERT model                                 #
# https://github.com/PolyAI-LDN/polyai-models#convert)  #
# This way, it will not be downloaded within deployment #
#########################################################

ENV TFHUB_CACHE_DIR="/var/tmp/tfhub_modules"
RUN python -c "import rasa.utils.train_utils as train_utils; train_utils.load_tf_hub_model('http://models.poly-ai.com/convert/v1/model.tar.gz')"
RUN ls -l $TFHUB_CACHE_DIR

I am using a helper function from rasa to ‘pre-load’ the model. This will trigger a download into the docker image, and it will be all set up properly.

Now, when the rasa container starts up, it will not download the model again.

Topic		Replies	Views
Problem with installation on Mac Getting Started with Rasa	1	104	October 7, 2020
Is anyone else having problems trying to download the poly-ai model? Rasa Open Source	33	3601	September 29, 2020
conveRT + HTTP Error 404 Rasa Open Source	4	990	October 8, 2020
Using ConveRT Tokenizer Offline? Getting Started with Rasa	2	137	March 18, 2020
Model Persist/Load from URL Tutorials, Resources & Videos	0	364	February 2, 2021

Caching TF-Hub modules

Related topics