When Docker images become fixed-point
We like to say that Docker images are like smoothies: you can immediately tell whether it’s your liking, but you can hardly guess what the ingredients are. Although containers are an efficient way to ship things, the core question is how these things are produced.
The aim of this post is to demonstrate that the issue is not Docker images by themselves. Instead the concrete question when talking about reproducibility is: where do binaries come from, and using which tool?
The scenario below illustrates how one can ship reproducible and
verifiable Docker images built by guix pack
. It had initially been
written as comment while reviewing
patch #45919.
Alice generates a Docker image
Alice is working on a standard scientific stack using Python.
She stores along her project the files manifest.scm
containing the
package set and channels.scm
containing the state of Guix (in other words,
its revision). With these two files, one can redeploy using
guix time-machine
the exact same computational environment.
Concretely, manifest.scm
reads:
(specifications->manifest
(list
"python"
"python-numpy"))
Alice produces the channels.scm
file by running guix describe -f channels
,
which returns this:
(list (channel
(name 'guix)
(url "https://git.savannah.gnu.org/git/guix.git")
(commit
"fb32a38db1d3a6d9bc970e14df5be95e59a8ab02")
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
So far, so good. Because Alice needs to run this stack on some infrastructure not running Guix but instead running Docker, she just packs her scientific stack with this command:
guix pack -f docker --save-provenance -m manifest.scm
For the next step, one option is to locally load the generated tarball using Docker tools, like so:
$ docker load < /gnu/store/6rga6pz60di21mn37y5v3lvrwxfvzcz9-python-python-numpy-docker-pack.tar.gz
Loaded image: python-python-numpy:latest
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
python-python-numpy latest ea2d5e62b2d2 51 years ago 431MB
… then running docker push
to upload the image to a registry.
The second option is to transfer the image to the target computer, and to run over there the Docker commands shown above. Once the image has been loaded on the target machine, running Python from that image just works:
$ docker run -ti python-python-numpy:latest python3
Python 3.8.2 (default, Jan 1 1970, 00:00:01)
[GCC 7.5.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
import numpy as np
>>> A = np.array([[1,0,1],[0,1,0],[0,0,1]])
A = np.array([[1,0,1],[0,1,0],[0,0,1]])
>>> _, s, _ = np.linalg.svd(A); s; abs(s[0] - 1./s[2])
_, s, _ = np.linalg.svd(A); s; abs(s[0] - 1./s[2])
array([1.61803399, 1. , 0.61803399])
0.0
>>> quit()
Neat!
On a side note, the Docker image is produced directly by Guix. That is,
Guix manages everything, from the binary packages and all the requirements to
the Docker image itself — no Dockerfile
involved. To guix pack
,
Docker images are one container format among others; for instance guix pack -f squashfs --save-provenance -m manifest.scm
generates a
Singularity image (other container format)
with the exact same binaries inside.
Bob retrieves and runs code from Alice’s image
Bob works with Alice's Docker image. He needs to run this exact same versions on another machine using plain relocatable tarballs, for example. Or he needs to scrutinize how all the binaries in this stack are produced, because maybe he found a bug and wants to know if all the results obtained with this Docker image are correct or not. Or maybe he wants to study a specific aspect to better understand a specific result. Bob is doing science and thus Bob needs transparency.
The files manifest.scm
and channels.scm
sadly disappeared a long time ago,
probably at the end of Alice's postdoc. Had the Docker image been
produced with a Dockerfile
, the game would most likely be over:
running docker build
on that Dockerfile
would probably give a
different result than back then (for instance because it starts by
running apt-get update
), or it may simply fail because some of
the resources it refers to have vanished from the Internet. There are
ways to mitigate it, for instance by resorting to
Debian’s snapshot service and/or using
debuerreotype to
recreate the image, assuming everything in the image was taken from
Debian. But overall, it’s safe to assume that a regular Dockerfile
does not describe a reproducible build process.
Fortunately, Bob remembers this Docker image had been produced with Guix
(pack --save-provenance
). Let’s get back the recipe of this smoothie.
First, let’s start the container, which makes it easier to export as a plain tarball. Second, let’s extract the embedded Guix profile:
$ docker run -d python-python-numpy:latest python3
e1775ff836915dc55195eafd1710eec07106bd1677bde153e5842a0ded43395d
$ docker export -o /tmp/re-pack.tar $(docker ps -a --format "{{.ID}}"| head -n1)
$ tar -xf /tmp/re-pack.tar $(tar -tf /tmp/re-pack.tar | grep 'profile/manifest')
$ tree gnu
gnu
└── store
└── ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile
└── manifest
2 directories, 1 file
Wow! Is it really a regular profile? Yes, it is! Because that profile
contains provenance metadata (thanks to --save-provenance
), we can ask
Guix to export that metadata in the form of a list of channels and a
manifest:
$ guix package -p gnu/store/ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile --export-channels
;; This channel file can be passed to 'guix pull -C' or to
;; 'guix time-machine -C' to obtain the Guix revision that was
;; used to populate this profile.
(list
(channel
(name 'guix)
(url "https://git.savannah.gnu.org/git/guix.git")
(commit
"fb32a38db1d3a6d9bc970e14df5be95e59a8ab02")
(introduction
(make-channel-introduction
"9edb3f66fd807b096b48283debdcddccfea34bad"
(openpgp-fingerprint
"BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))
)
$ guix package -p gnu/store/ia1sxr3qf3w9dj7y48rwvwyx289vpfgi-profile --export-manifest
;; This "manifest" file can be passed to 'guix package -m' to reproduce
;; the content of your profile. This is "symbolic": it only specifies
;; package names. To reproduce the exact same profile, you also need to
;; capture the channels being used, as returned by "guix describe".
;; See the "Replicating Guix" section in the manual.
(specifications->manifest
(list "python" "python-numpy"))
Awesome, isn't it? These last two outputs are equivalent to Alice's
manifest.scm
and channels.scm
files. At this stage, Bob’s a happy
person: he can now take these two files anywhere and rebuild the exact
same image at any time:
guix time-machine -C new-channels.scm \
-- pack -f docker --save-provenance -m new-manifest.scm
The command should produce the exact same docker-pack.tar
that Alice
provided,
bit for bit. If it
does not, then either the original image had been tampered with, or one
of the package build processes involved is non-deterministic — something
we would invite you to report as a
bug!
Join the fun, join us!
Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).