When Docker images become fixed-point

Simon Tournier — October 22, 2021

We like to say that Docker images are like smoothies: you can immediately tell whether it’s your liking, but you can hardly guess what the ingredients are. Although containers are an efficient way to ship things, the core question is how these things are produced.

The aim of this post is to demonstrate that the issue is not Docker images by themselves. Instead the concrete question when talking about reproducibility is: where do binaries come from, and using which tool?

The scenario below illustrates how one can ship reproducible and verifiable Docker images built by guix pack. It had initially been written as comment while reviewing patch #45919.

Alice generates a Docker image

Alice is working on a standard scientific stack using Python. She stores along her project the files manifest.scm containing the package set and channels.scm containing the state of Guix (in other words, its revision). With these two files, one can redeploy using guix time-machine the exact same computational environment.

Concretely, manifest.scm reads:

(specifications->manifest
 (list
  "python"
  "python-numpy"))

Alice produces the channels.scm file by running guix describe -f channels, which returns this:

(list (channel
        (name 'guix)
        (url "https://git.savannah.gnu.org/git/guix.git")
        (commit
          "fb32a38db1d3a6d9bc970e14df5be95e59a8ab02")
        (introduction
          (make-channel-introduction
            "9edb3f66fd807b096b48283debdcddccfea34bad"
            (openpgp-fingerprint
              "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA")))))

So far, so good. Because Alice needs to run this stack on some infrastructure not running Guix but instead running Docker, she just packs her scientific stack with this command:

guix pack -f docker --save-provenance -m manifest.scm

For the next step, one option is to locally load the generated tarball using Docker tools, like so:

$ docker load < /gnu/store/6rga6pz60di21mn37y5v3lvrwxfvzcz9-python-python-numpy-docker-pack.tar.gz
Loaded image: python-python-numpy:latest
$ docker images
REPOSITORY                                TAG          IMAGE ID       CREATED         SIZE
python-python-numpy                       latest       ea2d5e62b2d2   51 years ago    431MB

… then running docker push to upload the image to a registry.

The second option is to transfer the image to the target computer, and to run over there the Docker commands shown above. Once the image has been loaded on the target machine, running Python from that image just works:

$ docker run -ti python-python-numpy:latest python3
Python 3.8.2 (default, Jan  1 1970, 00:00:01)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
import numpy as np
>>> A = np.array([[1,0,1],[0,1,0],[0,0,1]])
A = np.array([[1,0,1],[0,1,0],[0,0,1]])
>>> _, s, _ = np.linalg.svd(A); s; abs(s[0] - 1./s[2])
_, s, _ = np.linalg.svd(A); s; abs(s[0] - 1./s[2])
array([1.61803399, 1.        , 0.61803399])
0.0
>>> quit()

Neat!

On a side note, the Docker image is produced directly by Guix. That is, Guix manages everything, from the binary packages and all the requirements to the Docker image itself — no Dockerfile involved. To guix pack, Docker images are one container format among others; for instance guix pack -f squashfs --save-provenance -m manifest.scm generates a Singularity image (other container format) with the exact same binaries inside.

Bob retrieves and runs code from Alice’s image

Bob works with Alice's Docker image. He needs to run this exact same versions on another machine using plain relocatable tarballs, for example. Or he needs to scrutinize how all the binaries in this stack are produced, because maybe he found a bug and wants to know if all the results obtained with this Docker image are correct or not. Or maybe he wants to study a specific aspect to better understand a specific result. Bob is doing science and thus Bob needs transparency.

The files manifest.scm and channels.scm sadly disappeared a long time ago, probably at the end of Alice's postdoc. Had the Docker image been produced with a Dockerfile, the game would most likely be over: running docker build on that Dockerfile would probably give a different result than back then (for instance because it starts by running apt-get update), or it may simply fail because some of the resources it refers to have vanished from the Internet. There are ways to mitigate it, for instance by resorting to Debian’s snapshot service and/or using debuerreotype to recreate the image, assuming everything in the image was taken from Debian. But overall, it’s safe to assume that a regular Dockerfile does not describe a reproducible build process.

Fortunately, Bob remembers this Docker image had been produced with Guix (pack --save-provenance). Let’s get back the recipe of this smoothie.

First, let’s start the container, which makes it easier to export as a plain tarball. Second, let’s extract the embedded Guix profile:

$ docker run -d python-python-numpy:latest python3
e1775ff836915dc55195eafd1710eec07106bd1677bde153e5842a0ded43395d
$ docker export -o /tmp/re-pack.tar $(docker ps -a --format "{{.ID}}"| head -n1)

$ tar -xf /tmp/re-pack.tar $(tar -tf /tmp/re-pack.tar | grep 'profile/manifest')
$ tree gnu
gnu
└── store
    └── ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile
        └── manifest

2 directories, 1 file

Wow! Is it really a regular profile? Yes, it is! Because that profile contains provenance metadata (thanks to --save-provenance), we can ask Guix to export that metadata in the form of a list of channels and a manifest:

$ guix package -p gnu/store/ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile --export-channels
;; This channel file can be passed to 'guix pull -C' or to
;; 'guix time-machine -C' to obtain the Guix revision that was
;; used to populate this profile.

(list
     (channel
       (name 'guix)
       (url "https://git.savannah.gnu.org/git/guix.git")
       (commit
         "fb32a38db1d3a6d9bc970e14df5be95e59a8ab02")
       (introduction
         (make-channel-introduction
           "9edb3f66fd807b096b48283debdcddccfea34bad"
           (openpgp-fingerprint
             "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
)

$ guix package -p gnu/store/ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile --export-manifest
;; This "manifest" file can be passed to 'guix package -m' to reproduce
;; the content of your profile.  This is "symbolic": it only specifies
;; package names.  To reproduce the exact same profile, you also need to
;; capture the channels being used, as returned by "guix describe".
;; See the "Replicating Guix" section in the manual.

(specifications->manifest
  (list "python" "python-numpy"))

Awesome, isn't it? These last two outputs are equivalent to Alice's manifest.scm and channels.scm files. At this stage, Bob’s a happy person: he can now take these two files anywhere and rebuild the exact same image at any time:

guix time-machine -C new-channels.scm \
     -- pack -f docker --save-provenance -m new-manifest.scm

The command should produce the exact same docker-pack.tar that Alice provided, bit for bit. If it does not, then either the original image had been tampered with, or one of the package build processes involved is non-deterministic — something we would invite you to report as a bug!

Join the fun, join us!

Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).

  • MDC
  • Inria
  • UBC
  • UTHSC