R, Docker and Checkpoint: A Route to Reproducibility

2019-08-28 R Docker Andrew B. Collier

I need to deploy Shiny on a Windows machine. I also need to use {checkpoint} for package management. Using Docker seems to be the only reasonable approach to Shiny on Windows. But how easy would it be to also factor {checkpoint} into this setup?

Only one reasonable way to find out: give it a try.

Below is the simple Dockerfile I used. Here are the fundamental components of what it does:

  • Derived from the R-3.6.1 image from rocker.
  • Create an environment variable CHECKPOINT_DATE with the snapshot date for {checkpoint}.
  • Install the {checkpoint} package.
  • Make a snapshot folder for {checkpoint}.
  • Add commands to .Rprofile which will load {checkpoint} and select the required snapshot.
  • Install a sample package under {checkpoint}.
FROM rocker/r-ver:3.6.1

ENV CHECKPOINT_DATE 2018-12-01

RUN R -e "install.packages('checkpoint')" && \
    mkdir -p /root/.checkpoint/${CHECKPOINT_DATE} && \
    echo "library(checkpoint); checkpoint('${CHECKPOINT_DATE}', scanForPackages = FALSE);" >~/.Rprofile && \
    R -e "install.packages('colorspace')"

After building the Docker image you’re ready to give this a whirl. There are (at least) two ways that you could use this:

  • install packages onto the image (will result in image bloat and requirement to rebuild for any new packages) or
  • share a volume from the host which contains a {checkpoint} snapshot folder with the Docker container.

Packages on the Image

Let’s look at the option of installing packages on the image first. The Dockerfile above already installs the {colorspace} package. Let’s test that out.

In the screen shot below the top panel shows the launch of the image and successful loading of the {colorspace} package. The lower panel connects a shell to the running container and lists the contents of the snapshot folder to confirm that the {colorspace} package is there.

Packages on the Host

If you share a snapshot folder from the host with the container then you get a lot more flexibility.

In the screen shot below the top panel shows the launch of the image, where the ~/.checkpoint folder on the host is shared with the container. Now it’s possible to select any of the snapshots present on the host. For example, rather than choosing the 2018-12-01 snapshot installed on the image, we can now select the 2019-06-01 snapshot from the host. It’s important to note that installing packages on the container will now result in them being stored in the snapshot folder on the host.

Of these two options the latter seems like a more flexible solution. If, however, your aim is to provide a Docker image with a complete (and reproducible) computational environment, then the former is definitely the way to go: it’s less flexible but the package versions are all locked down on the image.

Next: MySQL Backups.
Previous: All Roads Lead to Rome.