Back to the future: modules for Guix packages
Some things in our software world are timeless. The venerable
Environment Modules are one of these.
If you’ve ever used a high-performance cluster in the last three
decades, chances are you’re already familiar with it. Modules is about
managing software environments, just like Guix is—or, perhaps more
You will be delighted, or surprised, to learn that Guix now has a compatibility layer with Modules.
The legacy of Modules
As Furlani’s 1991 introductory paper
Modules were—and still are—a key enabler for Unix users, especially in
high-performance computing (HPC). The
module command lets users
manipulate their software environment in terms of packages, without
having to be Unix or shell experts; they let them compose packages and
build the software environment of their choice, without interfering with
other users; they give a level of flexibility that Unix alone wouldn’t
provide. The command-line interface is easily understood:
module load gcc/11.2
“loads” GCC 11.2 in your shell. You can “load” and “unload” software components at will:
module load python/3.8 module unload gcc
As an interface, Modules are easy to use and understand.
However, they leave it up to sysadmins (sometimes users) to
actually deploy the software. The common approach has been for
sysadmins to build and install, by themselves, the software that
Modules refer to. The end result is that modules vary from machine to
machine. For example the
gcc module shown above might refer to
GCC 11.2 on one cluster and GCC 8 on another; it might have an entirely
different name on a third cluster. Likewise, the
above might refer to different patch-level versions of Python 3.8, or
it might refer to a variant of Python
built with different dependencies or different build flags.
These issues have been largely mitigated by package managers such as EasyBuild and Spack: both automate package builds, and both can generate module files—Tcl snippets that define environment variables to set when “loading” a module. With EasyBuild and Spack, it becomes possible to not only automate deployment and module file generation, but also to deploy similar software on different machines.
“Similar”, though, does not mean “the same”. Software built with Spack or EasyBuild depends on software already available on the host system: it is built on top of a GNU/Linux distribution, which could be CentOS 7.4 (released in 2017), or Ubuntu 22.04, or really anything else. Thus, software installed with these tools depends on software provided by the underlying distribution, at build time and at run time.
This “hidden dependency” makes it hard to redeploy the exact same environment on a different machine or at a different point in time: the same build process might fail, or it might succeed but the resulting software might behave differently. Our approach in Guix is to not have that “hidden dependency”. Instead, the package dependency graph that Guix manipulates is self-contained: it includes package definitions for all the user-land software one may use.
From Guix to Modules
The news today is the release of
Guix-Modules, a new tool to
generate module files from
Guix packages. The primary goal, as with the module file generation
tools in EasyBuild and Spack, is to make it easy for HPC cluster
sysadmins to provide a set of modules for their users—more on that
below. Guix-Modules is an extension of Guix. To use it, you need to
install it and to set the
GUIX_EXTENSIONS_PATH environment variable,
guix install guix-modules export GUIX_EXTENSIONS_PATH="$HOME/.guix-profile/share/guix/extensions"
That gives you a new
guix module sub-command.
Let’s say you want to generate modules to
/opt/modules for selected
packages; you can do so by running:
guix module create -o /opt/modules \ coreutils gcc-toolchain python python-numpy
As with all Guix commands, it will build or download the packages if they’re not
around already and populate
/opt/modules with a bunch of module files.
/opt/modules already existed, it has been backed up under
/var/guix/profiles, which lets you roll back to the previous modules
should you regret your changes.
As an admin, you can periodically update the set of modules by running:
guix pull guix module create -o /opt/modules …
The good thing is that users can still access the previous module set,
until you explicitly remove it, under
Instead of having those long
guix module create command lines, you can
opt for listing the packages of interest in a manifest
which you can keep under version control. As with most other
commands, you can pass the manifest with:
guix module create -m my-modules.scm -o /opt/modules
Once the modules have been generated, you can happily load and unload
them using the familiar
unset MODULEPATH module use /opt/modules module load gcc-toolchain/11.2.0 module load python/3.9.9
Voilà! If you’re a sysadmin, here’s a new way to offer scientific software to your users without asking them to change their habits. The generated module files work equally well with the “original” Module implementation and with Lmod.
Since we, Guix developers, pride ourselves on providing a deployment
tool with good support for provenance tracking, we couldn’t just let
guix module command generate module files of unclear provenance.
Users—we think—ought to be able to determine the provenance of the
modules they use. We want to avoid the scenario many HPC practitioners
are familiar with whereby, six months after publishing an article, you
can no longer reproduce the computational results it contains because
the relevant modules have been upgraded or removed from under your feet
and you just don’t know how to reproduce them.
guix module create records provenance data in the module files
it generates. You can view that info by running
$ module help openblas ----------- Module Specific Help for 'openblas/0.3.18' ------------ This module was generated from a GNU Guix package. Provenance data (channels): (list (channel (url "https://git.savannah.gnu.org/git/guix.git") (branch "master") (commit "4ba35ccd18f90314caa76ea1833ffc383559401c") (name 'guix) (introduction (make-channel-introduction "9edb3f66fd807b096b48283debdcddccfea34bad" (openpgp-fingerprint "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))))
module help shows is the list of
from which this particular package was built. The information is in a
guix time-machine can readily consume. Assuming you
(list (channel …)) snippet in file
channels.scm, you can
go to another machine, at a later point in time, and deploy the exact
same software with this command:
guix time-machine -C channels.scm -- \ shell gcc-toolchain openblas
For users, it makes a big difference: modules are no longer ephemeral—they’re now a reproducible artifact that you can redeploy with Guix anywhere, anytime.
HPC users are often demanding when it comes to customizing
software build processes. Guix supports this need with a gamut of
available from the command line as well as through programming
guix module create honors package transformation options.
Among those, the
--tune option, which instructs Guix to optimize
relevant packages for the host
may come in handy. If you know your cluster contains only Skylake CPUs,
you’d rather make sure relevant packages are optimized for Skylake. To
do that, you would run, say:
guix module create --tune=skylake \ gcc-toolchain openblas gsl
“But what about reproducibility?”, you ask. The chosen package
--tune in this case—are also recorded as
part of the provenance data. This is what
module help reports:
$ module help gsl ----------- Module Specific Help for 'gsl/2.7' -------------------- This module was generated from a GNU Guix package. Provenance data (channels): (list (channel (url "https://git.savannah.gnu.org/git/guix.git") (branch "master") (commit "4ba35ccd18f90314caa76ea1833ffc383559401c") (name 'guix) (introduction (make-channel-introduction "9edb3f66fd807b096b48283debdcddccfea34bad" (openpgp-fingerprint "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))) Package transformations: ((tune . "skylake"))
The “Package transformations” bit is self-explanatory; it can be
passed as-is to
in a manifest.
We strongly believe one shouldn’t have to choose between performance and reproducibility and this is what this feature set supports.
Why all the fuss?
Guix is ten years
Guix-HPC itself is turning five this
year, so you might
wonder why after all these years we’re adding a Modules compatibility layer. After
can set up software environments on-the-fly in a way that is comparable to
module load. For instance, to start a shell to use GCC and Python as
in the example above, you would type:
guix shell gcc-toolchain@11 email@example.com
More generally, Guix puts users in control: it lets them upgrade when they want to and allows them to travel in time; it lets them customize packages, and it lets them replicate the same environment elsewhere or at a different point in time.
Using Guix directly remains the most empowering approach for users, but module files created from Guix packages can satisfy a number of user needs:
- Matching user habits. For some communities, not having to learn a new command—even if it’s not all that different, even if it has more to offer—is a big plus. It’s not uncommon for cluster admins to offer Modules in addition to Guix or other tools for that reason.
- Supporting incremental software environment construction. With
module, you can “load” and “unload” modules until you obtain the desired environment, whereas
guix shellcurrently expects a list of packages upfront. While exploring a problem space, the incremental mode might be more convenient—and indeed, patches have recently been discussed to support an incremental mode in
- Supporting simple Guixy cluster setups. The Guix typical cluster
requires running the build daemon, ensuring it can access the
network to download source or binaries, making it accessible to
front nodes and (optionally) build nodes, and setting up a couple
of NFS exports. Sysadmins who’d rather not do that can instead use
guix module createand offer those modules to users. The
/gnu/storedirectory still needs to be exported over NFS, but that’s a read-only export, and it’s all that’s needed—a simpler setup.
If you’re an HPC cluster user or system administrator, we’d love to hear
your thoughts on the
guix-science mailing list or