Faster relocatable packs with Fakechroot
command creates “application bundles” that can be used to deploy
software on machines that do not run Guix (yet!), such as HPC clusters. Since
its inception in
it has seen a number of improvements, such as the ability to create
Docker and Singularity container images. Some clusters lack these
tools, though, and the addition of relocatable
was a way to address that. This post looks at a new execution engine
for relocatable packs that has just
landed with the goal of improving
Before we get into that, let’s recap how relocatable packs work.
Essentially, a relocatable pack is a plain old tarball that contains the applications of your choosing along with all their dependencies, such that you can run them on any GNU/Linux machine. To create a pack containing Python and NumPy, run:
guix pack -RR python python-numpy -S /bin=bin
-RR flag asks for the creation of what we jokingly refer
to as a reliably relocatable pack
(more on that below), while the
-S flag asks for the creation of a
/bin symbolic link in the tarball.
The result of that command is a tarball that you can send on another machine, unpack, and then run Python directly from there without any special privileges:
tar xf pack.tar.gz
That’s it! All you need on the target machine is
tar, and the rest
Relocation with PRoot
guix pack -R (with a single
-R) creates relocatable packs that
require kernel support for unprivileged user
However, some systems have them disabled, and older systems do not
support them at all—the
./bin/python command above wouldn’t work on
-RR option we saw above adds a universal fallback option: on a
system where unprivileged user namespaces are not available, the
./bin/python command above automatically falls back to using
PRoot. PRoot achieves
file system virtualization by intercepting the process’ system calls
The advantage is that it always works—it doesn’t rely on any special
ptrace has “always been there” so to speak. The
drawback is that it incurs significant overhead at every system call.
This is acceptable for an interactive program, or, say, for a
single-threaded number-crunching application. But the performance hit
is prohibitive, for example, for an MPI or multi-threaded
application—input/output and synchronization happen via system calls.
To address this performance issue, we have just added a third execution
engine to relocatable packs relying on
trickery. Users of relocatable packs can now choose at run time an
execution engine by setting the
variable. If you choose the
performance engine, the application will
choose user namespaces or, if they are not supported, fall back to
guix pack -RR wraps the application executables, in this case
python. Those wrappers are small statically-linked programs that
implement the execution
fakechroot engine works like that:
PT_INTERPsegment of the wrapped executable contains the file name of the dynamic linker,
/gnu/storedoesn’t exist on the host machine, the dynamic linker is invoked directly, with its file name computed relative to the wrapper’s file name.
The loader is told to preload the Fakechroot shared library, which interposes on the file system functions of the C library (
stat, etc.) and “translates”
/gnu/storeabsolute file names to their actual location.
RUNPATHof Guix executables and shared libraries lists the
/gnu/storedirectories that contain the libraries they depend on. The
ld.soitself makes are not interposable, so Fakechroot doesn’t help here. Fortunately, the little-known audit interface of the GNU dynamic linker comes in handy: its
la_objsearchhook allows you to alter the way
ld.solooks for shared libraries. Thus, a few lines of C are all it takes to get
/gnu/storefile names. Neat!
fakechroot engine incurs very little overhead, and only on file
system function calls, making it a great option for HPC workloads. The
default engine remains user namespaces with a fallback to PRoot, so be
sure to set
GUIX_EXECUTION_ENGINE=performance. See the
for more info.
A call to HPC system administrators
guix pack -RR allows you to deploy software stacks on a Guix-less
cluster that lacks both support for unprivileged user namespaces and a
container facility such as Singularity, without loss of performance.
A similar combination of execution engines for unprivileged users can be
found in udocker, though the
tool has different goals. Having discussed these techniques,
it’s good to take a step back and look at the bigger picture.
All these shenanigans would be unnecessary if unprivileged user
namespaces were universally available. In fact, when we released
guix pack -R two years
we thought (hoped?) that widespread availability of unprivileged user
namespaces was imminent. After all, the feature had already been
available in the Linux kernel since version 3.8, released in 2013.
Unfortunately, today, major academic HPC clusters still run a derivative of Red Hat Enterprise Linux (RHEL) or CentOS 7, released in 2015 with Linux 3.10, where the decision was made to disable user namespaces. RHEL 8 and derivatives are documented as having an easy way to set up user namespaces.
We encourage HPC system administrators to consider enabling unprivileged user namespaces. They allow unprivileged users to deploy pre-built software, be it through a relocatable Guix pack or via container run-time support tools like runC, with virtually no overhead. More generally, user namespaces enable reproducible software environments, a prerequisite for reproducible scientific experiments!
Many thanks to Carlos O’Donell, steward for the GNU C Library, for
reviewing initial revisions of the
fakechroot execution engine and for
suggesting the use of the
ld.so audit interface.