Continuous integration and continuous delivery for HPC
Will those binaries actually work? This is a central question for HPC practitioners and one that’s sometimes hard to answer: increasingly complex software stacks being deployed, and often on a variety of clusters. Will that program pick the right libraries? Will it perform well? With each cluster having its own hardware characteristics, portability is often considered unachievable. As a result, HPC practitioners rarely take advantage of continuous integration and continuous delivery (CI/CD): building software locally on the cluster is common, and software validation is often a costly manual process that has to be repeated on each cluster.
We discussed before that use of pre-built binaries is not inherently an obstacle to performance, be it for networking or for code—a property often referred to as performance portability. Thanks to performance portability, continuous delivery is an option in HPC. In this article, we show how Guix users and system administrators have benefited from continuous integration and continuous delivery on HPC clusters.
But first things first: before we talk about continuous integration, we need to talk about hermetic or isolated builds. One of the key insights of the pioneering work of Eelco Dolstra on the Nix package manager is this: by building software in isolated environments, we can eliminate interference with the rest of the system and practically achieve reproducible builds. Simply put, if Alice runs a build process in an isolated environment on a supercomputer, and Bob runs the same build process in an isolated environment on their laptop, they’ll get the same output (unless of course the build process is not deterministic).
From that perspective, pre-built binaries in Guix (and Nix) are merely substitutes for local builds: you can choose to build things locally, but as an optimization you may just as well fetch the build result from someone you trust—since it’s the same as what you’d get anyway.
A closely related property is full control of the software package
dependency graph. Guix package definitions stand alone: they can only
refer to one another and cannot refer to software that happens to be
available on the machine in
/usr/lib64, say—that directory is not even
visible in the isolated build environment! Thus, a package in Guix has
its dependencies fully specified, down to the C library—and even
Thanks to hermetic builds and standalone dependency graphs, sharing binaries is safe: by shipping the package and all its dependencies, without making any assumptions on software already available on the cluster, you control what you’re going to run.
Continuous integration & continuous delivery
Guix uses continuous integration to build its more than 22,000 packages
on several architectures: x86_64, i686, AArch64, ARMv7, and POWER9. The
project has two independent build farms. The main one, known as
ci.guix.gnu.org, was generously donated by
the Max Delbrück Center for Molecular Medicine
(MDC) in Germany; it has more than twenty
64-core x86-64/i686 build machines and a dozen of build machines for the
The diagram above illustrates the packaging workflow in Guix, which can be summarized as follows:
- packagers write a package definition;
- they test it locally by using
- eventually someone with commit access pushes the changes to the Git repository;
- build farms pull from the repository and build the new package.
Build farms are a quality assurance tool for packagers. For instance,
Cuirass. The web interface often
surprises newcomers—it sure looks different from those of Jenkins or
GitLab-CI!—but the key part is that it provides a dashboard that one can
navigate to look for packages that fail to build, fetch build logs, and
A big difference with traditional continuous integration tools is that
build results from the build farm are not thrown away: by running
on the build farm, those binaries are made accessible to Guix users. Any
Guix user may add
ci.guix.gnu.org to their list of substitute
and they will transparently get binaries from that server.
One can check whether pre-built binaries of specific packages are
available on substitute servers by running
$ guix weather gromacs petsc scotch computing 3 package derivations for x86_64-linux... looking for 5 store items on https://ci.guix.gnu.org... https://ci.guix.gnu.org ☀ 100.0% substitutes available (5 out of 5) at least 41.5 MiB of nars (compressed) 109.6 MiB on disk (uncompressed) 0.112 seconds per request (0.2 seconds in total) 8.9 requests per second looking for 5 store items on https://bordeaux.guix.gnu.org... https://bordeaux.guix.gnu.org ☀ 100.0% substitutes available (5 out of 5) at least 30.0 MiB of nars (compressed) 109.6 MiB on disk (uncompressed) 0.051 seconds per request (0.2 seconds in total) 19.7 requests per second
That way, one can immediately tell whether deployment will be quick or whether they’ll have to wait for compilation to complete…
Publishing binaries for third-party channels
Our research institutes typically have channels providing packages for their own software or software related to their field. How can they benefit from continuous integration and continuous delivery?
At Inria, we set up a build farm that
runs Cuirass and publishes its binaries with
guix publish. Cuirass is
configured to build the packages of selected channels such as
guix-science (the Guix
how to set up Cuirass on Guix System; you can also check out the
of this build farm for details). That way, it complements the official
build farms of the Guix project.
The HPC clusters that the teams at Inria use, in particular
PlaFRIM and Grid’5000, are
set up to fetch substitutes from
addition to the Guix’s default substitute servers. When deploying
packages from our channels on one of these clusters, binaries are
readily available—a significant productivity boost! That also applies
to binaries tuned for a specific CPU
The Grid’5000 setup takes advantage of this flexibility in interesting
ways. Grid’5000 is a “cluster of clusters” with 8 sites, each of which
has its own Guix installation. To share binaries among sites, each site
guix publish instance, and each site has the other sites in its
list of substitute URLs. That way, if a site has already built, say,
Open MPI, the other sites will transparently fetch Open MPI binaries
from it instead of rebuilding it.
While Cuirass is a fine continuous integration tool tightly integrated with Guix, it’s also entirely possible to use one of the mainstream tools instead. Here are examples of computing infrastructure that publishes pre-built binaries:
- GliCID, the Tier-2 cluster for the region of Nantes (France), builds packages with Cuirass and publishes binaries.
- ZPID publishes binaries of relevant packages built with a simple cron script.
- GeneNetwork runs continuous integration jobs with Laminar and publishes the resulting binaries.
- Phil Beadling of Quantile Technologies explained how they integrated Guix in their Jenkins CI/CD pipeline.
As you can see, there’s a whole gamut of possibilities, ranging from the
“low-tech” setup to the fully-featured CI/CD pipeline. In all of these,
guix publish takes care of the publication part. If your focus is on
delivering binaries for a small set of packages, a periodic cron job as
shown above is good enough. If you’re dealing with a large package set
and are also interested in quality assurance, a tool like Cuirass may be
We computer users all too often work in silos. Developers might have their own build and deployment machinery that they use for continuous integration (GitLab-CI with some custom Docker image?); system administrators might deploy software on clusters in their own way (Singularity image? environment modules?); and users might end up running yet other binaries (locally built? custom-made?). We got used to it, but if we take a step back, it looks like this is one and the same activity with a different cloak depending on who you’re talking to.
Guix provides a unified approach to software deployment; building, deploying, publishing binaries, and even building container images all build upon the same fundamental mechanisms. We have seen in this blog post that this makes it easy to continuously build and publish package binaries. The productivity boost is twofold: local recompilation goes away, and site-specific software validation is reduced to its minimum.
For HPC practitioners and hardware vendors, this is a game changer.
Thanks to Lars-Dominik Braun, Simon Tournier, and Ricardo Wurmus for their insightful comments on an earlier draft of this post.