Hello,
I am trying to extend rasa/rasa:2.1.2-spacy-en
to include Spacy’s ja_core_news_md
.
With the help of rasa-demo and the answer here, I came up with this Dockerfile
.
FROM rasa/rasa:2.1.2-spacy-en
# Use subdirectory as working directory
WORKDIR /app
# Change back to root user to install dependencies
USER root
RUN apt-get install -y gcc && \
apt-get autoremove -y
RUN pip install spacy==2.3.2 spacy-lookups-data --no-cache-dir
RUN python -m spacy download ja_core_news_md && \
python -m spacy link ja_core_news_md ja
# Switch back to non-root to run code
USER 1001
It was successful in running a previously trained model in Windows (non-Docker). Notes:
- I had to update
spacy
to2.3.2
because theja_core_news_md
model started to be available at that version. - I had to install
gcc
because when I tried without it, there was a build error fromsudachipy
. - I moved to Docker from non-Docker (Windows, using pyenv then conda) because I want to utilize the WSL2 GPU during training. I am also working across different operating systems.
My issue is the size of the Docker image.
REPOSITORY TAG IMAGE ID CREATED SIZE
rasa/rasa 2.1.2-spacy-en-ja 90f6e904afb4 13 minutes ago 2.39GB
rasa/rasa 2.1.2-full 1ae20eafdcbd 47 hours ago 1.91GB
rasa/rasa 2.1.2-spacy-en 31ca289ae941 47 hours ago 1.82GB
The spacy-en-ja
image is a bit bigger than 2.1.2-full
I will try to build from the cloned rasa repository with my updates. (It’s just that my Internet had already slowed down)
I would appreciate it if there are any tips to improve this Dockerfile.