Supporting academic conference artifact evaluation

Emmanuel Agullo, Ludovic Courtès, Romain Garbage, Florent Pruvost, Philippe Swartvagher — July 26, 2024

Having promoted Guix as one of the tools to support reproducible research workflows, we are happy that it is now officially presented as one way to produce and review software artifacts that accompany articles submitted to SuperComputing 2024 (SC24), the leading HPC conference. In this post we look at what this entails and reflect on the role of reproducible software deployment on conference artifact evaluation.

Artifact evaluation at SuperComputing 2024

Like many other conferences, SuperComputing has had a Reproducibility Initiative for several years now. The conference prides itself on being a leader in tangible progress towards scientific rigor, through its pioneering practice of enhanced reproducibility of accepted papers. That scientific rigor has been lacking in the evaluation of reproducibility of computational experiments is a sad reality, and we can only applaud efforts to rectify this. Artifact review badges such as those introduced more than a decade ago by the Association for Computer Machinery (ACM) were a step in the right direction and an inspiration for many computer science conferences.

The artifact evaluation guidelines of SC24 suggest three ways in which authors can provide software artifacts in a way that eases their evaluation by reviewers:

Providing instructions to build the software, ideally tested on one of the Chameleon Cloud images provided by the Artifact Evaluation Committee.
If the first option isn't practical, giving access to the author’s own computational resources.
Optionally—and this is the first time—using Guix to provide metadata to deploy and run the computational experiment.

The SC24 guidelines further state:

This year’s initiative proposes the optional use of Guix, a software tool designed to support reproducible software deployment. Guix allows [the] deployment of the exact same software environment on different machines and at different points in time, while still retaining complete provenance info. By eliminating almost entirely variability induced by the software environment, Guix gives authors and reviewers more confidence into the results of computational experiments.

Indeed, option 1 amounts to providing manual build instructions—a sequence of commands to build the software. Those instructions unavoidably make implicit assumptions about the software environment. A software environment cannot possibly be fully captured by a short, human-readable sequence of instructions. For example, the instructions might assume that a C compiler is available, that it’s “recent enough” to build their package, or that some library is already installed. Those build instructions are bound to fail on different systems or a different point in time.

This is where Guix can improve the touted scientific rigor. Guix provides the complete software deployment recipe. As shown in our guide to reproducible research papers, providing a pinned channels file and a manifest allows anyone to redeploy the exact same software environment. But it also provides enough freedom to allow for experimentation beyond that predefined environment: given these two files, one can deploy variants of the software environment, for example to study the impact of changing the version of a package, of passing a specific build flag, of applying a patch to a specific component in the stack, and so forth.

The SC24 guidelines link to a guide that we wrote to help authors who wish to ensure reproducible deployment of their software environment with Guix. It builds upon our earlier guide, explaining how to write package manifests and deploy them with guix shell, how to pin channels with guix describe, and how to jump to those pinned channels using guix time-machine. It also provides tips and tricks that are crucial in HPC, from MPI to GPU usage.

Going further

We are glad reproducible deployment makes its first appearance in the artifact evaluation guidelines of a major conference. This is just one computer science conference, but in a field that is very demanding. Surely, if this can be done in HPC, this can be adapted to other conferences.

While conferences are increasingly taking software deployment into account, the common go-to better-than-nothing approach is to ask authors to provide binary bundles—Docker or virtual machine (VM) images. Undoubtedly that greatly facilitates artifact evaluation—author-provided code can be readily executed—but it does so at the expense of transparency and of experimentation. Our goal is to raise awareness of this fundamental limitation in the reproducible research and open science community, and to add reproducible software deployment to our “best practices” book. “Scientific rigor” demands more than the bits of the binaries used in our computational experiments.

If you are part of an artifact evaluation committee, we would love to hear from you!

Unless otherwise stated, blog posts on this site are copyrighted by their respective authors and published under the terms of the CC-BY-SA 4.0 license and those of the GNU Free Documentation License (version 1.3 or later, with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts).

Source of this site