Back to the future: modules for Guix packages

Ludovic Courtès — May 6, 2022

Some things in our software world are timeless. The venerable Environment Modules are one of these. If you’ve ever used a high-performance cluster in the last three decades, chances are you’re already familiar with it. Modules is about managing software environments, just like Guix is—or, perhaps more accurately, guix shell.

You will be delighted, or surprised, to learn that Guix now has a compatibility layer with Modules.

Environment Modules logo.

The legacy of Modules

As Furlani’s 1991 introductory paper explains, Modules were—and still are—a key enabler for Unix users, especially in high-performance computing (HPC). The module command lets users manipulate their software environment in terms of packages, without having to be Unix or shell experts; they let them compose packages and build the software environment of their choice, without interfering with other users; they give a level of flexibility that Unix alone wouldn’t provide. The command-line interface is easily understood:

module load gcc/11.2

“loads” GCC 11.2 in your shell. You can “load” and “unload” software components at will:

module load python/3.8
module unload gcc

As an interface, Modules are easy to use and understand. However, they leave it up to sysadmins (sometimes users) to actually deploy the software. The common approach has been for sysadmins to build and install, by themselves, the software that Modules refer to. The end result is that modules vary from machine to machine. For example the gcc module shown above might refer to GCC 11.2 on one cluster and GCC 8 on another; it might have an entirely different name on a third cluster. Likewise, the python/3.8 module above might refer to different patch-level versions of Python 3.8, or it might refer to a variant of Python built with different dependencies or different build flags.

These issues have been largely mitigated by package managers such as EasyBuild and Spack: both automate package builds, and both can generate module files—Tcl snippets that define environment variables to set when “loading” a module. With EasyBuild and Spack, it becomes possible to not only automate deployment and module file generation, but also to deploy similar software on different machines.

“Similar”, though, does not mean “the same”. Software built with Spack or EasyBuild depends on software already available on the host system: it is built on top of a GNU/Linux distribution, which could be CentOS 7.4 (released in 2017), or Ubuntu 22.04, or really anything else. Thus, software installed with these tools depends on software provided by the underlying distribution, at build time and at run time.

This “hidden dependency” makes it hard to redeploy the exact same environment on a different machine or at a different point in time: the same build process might fail, or it might succeed but the resulting software might behave differently. Our approach in Guix is to not have that “hidden dependency”. Instead, the package dependency graph that Guix manipulates is self-contained: it includes package definitions for all the user-land software one may use.

From Guix to Modules

The news today is the release of Guix-Modules, a new tool to generate module files from Guix packages. The primary goal, as with the module file generation tools in EasyBuild and Spack, is to make it easy for HPC cluster sysadmins to provide a set of modules for their users—more on that below. Guix-Modules is an extension of Guix. To use it, you need to install it and to set the GUIX_EXTENSIONS_PATH environment variable, like so:

guix install guix-modules
export GUIX_EXTENSIONS_PATH="$HOME/.guix-profile/share/guix/extensions"

That gives you a new guix module sub-command.

Let’s say you want to generate modules to /opt/modules for selected packages; you can do so by running:

guix module create -o /opt/modules \
  coreutils gcc-toolchain python python-numpy

As with all Guix commands, it will build or download the packages if they’re not around already and populate /opt/modules with a bunch of module files. If /opt/modules already existed, it has been backed up under /var/guix/profiles, which lets you roll back to the previous modules should you regret your changes.

As an admin, you can periodically update the set of modules by running:

guix pull
guix module create -o /opt/modules …

The good thing is that users can still access the previous module set, until you explicitly remove it, under /var/guix/profiles.

Instead of having those long guix module create command lines, you can opt for listing the packages of interest in a manifest file, which you can keep under version control. As with most other guix commands, you can pass the manifest with:

guix module create -m my-modules.scm -o /opt/modules

Once the modules have been generated, you can happily load and unload them using the familiar module sub-commands:

unset MODULEPATH
module use /opt/modules
module load gcc-toolchain/11.2.0
module load python/3.9.9

Voilà! If you’re a sysadmin, here’s a new way to offer scientific software to your users without asking them to change their habits. The generated module files work equally well with the “original” Module implementation and with Lmod.

Provenance tracking

Since we, Guix developers, pride ourselves on providing a deployment tool with good support for provenance tracking, we couldn’t just let that guix module command generate module files of unclear provenance. Users—we think—ought to be able to determine the provenance of the modules they use. We want to avoid the scenario many HPC practitioners are familiar with whereby, six months after publishing an article, you can no longer reproduce the computational results it contains because the relevant modules have been upgraded or removed from under your feet and you just don’t know how to reproduce them.

Thus, guix module create records provenance data in the module files it generates. You can view that info by running module help:

$ module help openblas

----------- Module Specific Help for 'openblas/0.3.18' ------------

This module was generated from a GNU Guix package.
Provenance data (channels):

  (list (channel
          (url "https://git.savannah.gnu.org/git/guix.git")
          (branch "master")
          (commit
            "4ba35ccd18f90314caa76ea1833ffc383559401c")
          (name 'guix)
          (introduction
            (make-channel-introduction
              "9edb3f66fd807b096b48283debdcddccfea34bad"
              (openpgp-fingerprint
                "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA")))))

What module help shows is the list of channels from which this particular package was built. The information is in a format that guix time-machine can readily consume. Assuming you store the (list (channel …)) snippet in file channels.scm, you can go to another machine, at a later point in time, and deploy the exact same software with this command:

guix time-machine -C channels.scm -- \
  shell gcc-toolchain openblas

For users, it makes a big difference: modules are no longer ephemeral—they’re now a reproducible artifact that you can redeploy with Guix anywhere, anytime.

Customization

HPC users are often demanding when it comes to customizing software build processes. Guix supports this need with a gamut of package transformation options available from the command line as well as through programming interfaces. Good news: guix module create honors package transformation options.

Among those, the --tune option, which instructs Guix to optimize relevant packages for the host micro-architecture, may come in handy. If you know your cluster contains only Skylake CPUs, you’d rather make sure relevant packages are optimized for Skylake. To do that, you would run, say:

guix module create --tune=skylake \
  gcc-toolchain openblas gsl

In this particular case, GSL gets built for Skylake, using GCC’s -march=skylake option (OpenBLAS itself chooses optimized routines at run time so it is unaffected).

“But what about reproducibility?”, you ask. The chosen package transformation option(s)—--tune in this case—are also recorded as part of the provenance data. This is what module help reports:

$ module help gsl

----------- Module Specific Help for 'gsl/2.7' --------------------

This module was generated from a GNU Guix package.
Provenance data (channels):

  (list (channel
          (url "https://git.savannah.gnu.org/git/guix.git")
          (branch "master")
          (commit
            "4ba35ccd18f90314caa76ea1833ffc383559401c")
          (name 'guix)
          (introduction
            (make-channel-introduction
              "9edb3f66fd807b096b48283debdcddccfea34bad"
              (openpgp-fingerprint
                "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA")))))

Package transformations:

  ((tune . "skylake"))

The “Package transformations” bit is self-explanatory; it can be passed as-is to options->transformation in a manifest.

We strongly believe one shouldn’t have to choose between performance and reproducibility and this is what this feature set supports.

Why all the fuss?

Guix is ten years old, Guix-HPC itself is turning five this year, so you might wonder why after all these years we’re adding a Modules compatibility layer. After all, guix shell can set up software environments on-the-fly in a way that is comparable to module load. For instance, to start a shell to use GCC and Python as in the example above, you would type:

guix shell gcc-toolchain@11 python@3.8

More generally, Guix puts users in control: it lets them upgrade when they want to and allows them to travel in time; it lets them customize packages, and it lets them replicate the same environment elsewhere or at a different point in time.

Using Guix directly remains the most empowering approach for users, but module files created from Guix packages can satisfy a number of user needs:

  1. Matching user habits. For some communities, not having to learn a new command—even if it’s not all that different, even if it has more to offer—is a big plus. It’s not uncommon for cluster admins to offer Modules in addition to Guix or other tools for that reason.
  2. Supporting incremental software environment construction. With module, you can “load” and “unload” modules until you obtain the desired environment, whereas guix shell currently expects a list of packages upfront. While exploring a problem space, the incremental mode might be more convenient—and indeed, patches have recently been discussed to support an incremental mode in guix shell.
  3. Supporting simple Guixy cluster setups. The Guix typical cluster setup requires running the build daemon, ensuring it can access the network to download source or binaries, making it accessible to front nodes and (optionally) build nodes, and setting up a couple of NFS exports. Sysadmins who’d rather not do that can instead use guix module create and offer those modules to users. The /gnu/store directory still needs to be exported over NFS, but that’s a read-only export, and it’s all that’s needed—a simpler setup.

If you’re an HPC cluster user or system administrator, we’d love to hear your thoughts on the guix-science mailing list or #guix-hpc channel on Libera.chat!

Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).

  • MDC
  • Inria
  • UBC
  • UTHSC