Using ConveRT Tokenizer Offline?

Hi,

I’m very new to RASA, and glad to join this awesome community!

I recently started looking into NLU and tried to implement rasa’s NLU module in python. The recommended pipeline is using ConveRT tokenizer. I tried training it, but sometimes it takes very long to load the model. Upon further observation, I noticed that the Trainer is downloading ConveRT from poly.ai everytime I restarted my computer, hence the question: is it possible to use ConveRT tokenizer offline? is it possible to have a copy of the downloaded model instead of having to download it everytime?

Thanks!

RASA version: 1.8.1

Python version: 3.7

Hey @hliyanto, great to have you join the community!

So TFHub uses a a cache directory, which is set to /tmp/... by default. That of course gets cleared on every restart, and then it will download the model again. If you want it to be persisted elsewhere so that doesn’t get cleared, you can always set the TFHUB_CACHE_DIR environment variable to a custom directory.

And to your question about using the ConveRT pipeline offline - yes of course you can, once the model is downloaded :slight_smile:

1 Like

Thank you, that’s awesome!