📺 Video recordings of the talks (not tutorials) are available below: click on the screen next to each talk.
Wednesday afternoon introduces reproducible software deployment with Guix as a foundation for reproducible research workflows.
- 11:30–13:10🍽️ lunch (optional)
- 13:10–13:30👋 Welcome!
Read more...Programmers mentally divide code into three layers: their own code, the libraries and tools they interact with, and the environment. Unfortunately, the environment is the layer they care about least. My mission is to convince you that the environment is interesting and worth caring about. Did you know that the environment metaphor is very inaccurate? It's really the foundation supporting your code. And then, there are additional environments you should be aware of: the social environments of developers and users, society at large, the physical environment of our computing systems. Are you ready to become an environmentalist?
The lack of reproducibility of much scientific research pointed out in 2009 by D. Donoho and many other researchers has consequences not only on the credibility of the results, but also it has a very direct and negative impact in aspects such as public health, safety, or security. There's a consensus on the benefits of reproducibility, but only a minority of researchers are fully committed. We'll review some of the reasons why, and think about how reproducible research could be encouraged and rewarded.
From a practical point of view, we'll discuss good practices both to write and review reproducible scientific articles. To this purpose we'll review good practices related to the execution environments of the software, the versioning of the code, the FAIR principles for data, formats, standards, and the quality of the source code itself
Read more...Reproducible research is necessary to ensure that scientific work can be trusted. Funders and publishers are beginning to require that publications include access to the underlying data and the analysis code. The goal is to ensure that all results can be independently verified and built upon in future work. This is sometimes easier said than done! Sharing these research outputs means understanding data management, library sciences, software development, and continuous integration techniques: skills that are not widely taught or expected of academic researchers. A particularly steep barrier to working with codebases is setting up computational environments, and getting the combination of package versions just right can influence the reproducibility of code: from outright failures, to subtle changes in generated outputs. There are many tools available to manage your computational environment; but in this talk, we’ll explore Project Binder and its subproject repo2docker, which aims to automate reproducibility best practices across a number of ecosystems. Binder can build portable computational environments, when requested, with all the information encoded in a single, clickable URL, which greases the wheels of collaborative research while reducing the toil involved. We will discuss how these concepts can apply to the HPC community.
- 15:45–16:15☕ break
Navigating the jungle of reproducible environments can be pretty tough, as there is a myriad of problems to consider: does this language-specific package manager build packages reproducibly? How does it integrate its external dependencies? Is the compiler bootstrapped? Will all the metadata it's using disappear in X years? Will my distributed artifacts work without external assumptions? Do I have tools and guarantees to examine all of this?
With these interrogations in mind, I will give an overview of Guix, the swiss army knife of environments, and describe how it can help you achieve reproducibility. I will also be comparing it with some other practices I've seen and try to dismiss some misconceptions about them, highlighting Guix's strengths.
- 17:00–17:45Nicolas Vallet (University Hospital of Tours, Hematology and Cell Therapy department, Inserm U1069, LNOx group, France)📁📺
Read more...In the biomedical environment, reproducibility is mainly taught in the setting of bench experiments. Data analyses description usually focus on basic measures such as mentioning the software version used, sharing the raw data, and occasionally providing a partial script. This presentation aims to showcase how we went from version labels to Guix and how it shaped our workflows to analyze various types of data, encompassing omics and targeted measurements. We will provide insights into how we effectively reported its utilization in our published manuscripts.
- 18:00–19:30🎉 Guix install party (optional)
Thursday morning will feature tutorials about Guix and related development tools by great experienced people. In the afternoon, research software engineers, system administrators, and scientists will share their experience with Guix in HPC and research.
- 08:30–09:00👋 Welcome back!
Read more...Software Heritage, the world largest source code archive, is designed to collect, preserve and share source code for the long term. After an overview of the Software Heritage mission and infrastructure, we'll discover how Software Heritage created a unique and powerful system of source code referencing, using the SWHID standard. Then a demo/tutorial will show you how to archive and reference your own source code using Software Heritage tools and infrastructure.
- 10:00–10:15☕ break
- 10:15–11:30tutorialsession 1: amphitheater📁
Read more...This tutorial, intended for a novice audience, aims you to get started with the gitlab forge and a versioning system (git). We will see what is a forge and why it is today an essential tool for research reproducibility. At the end of this tutorial you will know how to use the main tools offered by the forge as well as the basic git commands to manage one or several project(s). Small demonstrations will be carried out from a simple one to illustrate the git commands to a more advanced example working with several branches on a code development project.
- 10:15–11:30tutorialsession 2: classroomAdvanced Gitlab – How to setup Continuous Integration in your Gitlab projects, another step towards software reproducibility📁
Read more...In software engineering, Continuous Integration (CI) is a practice which consists in systematically checking the impact of any source code modification on operation, performance, etc. via an automatic execution chain. Combined with Docker or equivalent tools, Gitlab offers very practical and powerful procedures and tools to implement continuous integration in your projects. In this tutorial, we will describe the CI setup in a software project and show how it someway helps to ensure software reproducibility.
- 11:30–12:45tutorialsession 1: amphitheater📁
Jupyter notebooks are excellent pedagogical and methodological tools for explaining concepts and reasoning involving the processing of digital data, making it possible to include formatted text, multimedia elements and software code in a single interface.
However, setting up a notebook execution environment can sometimes be tedious and time-consuming. In this tutorial, we'll look at how notebooks can be put to good use and how a GitLab project combined with technologies such as JupyterLite or BinderHub can greatly simplify the provision of Jupyter notebooks.
- 11:30–12:45tutorialsession 2: classroom📁
In computer science in general and in HPC in particular, reproducibility of a research study has always been a complex matter. On the one hand, rebuilding exactly the same software environment on various computing platforms and over extended periods of time may be long, tedious and sometimes virtually impossible to be done manually. On the other hand, while the experimental method is usually explained in research studies, the instructions required to reproduce the latter from A to Z should also be provided in a comprehensive manner.
In this tutorial, following a brief presentation of the context and motivations, we will introduce the principles of literate programming with Org mode and learn how we can take advantage of it in the association with Guix to build a reproducible experimental study. Assuming the knowledge of some basics of Guix (searching for and installing packages, spawning simple environments with the guix shell command), we will use it during the hands-on session to manage the software environment of an example study thanks to the guix time-machine command and a more advanced usage of the guix shell command including manifests and package transformations. Then, we will rely on the literate programming paradigm and use Org mode to not only write the study itself, but also to describe all the elements and instructions allowing for reproducing it. This includes the experiments, source code and procedures involved in the construction of experimental software environments, execution of benchmarks, gathering and post-processing of results and production of the final publication(s).
For the hands-on session, the participants will need to bring a personal computer on which they have installed GNU Guix 1.4.0 beforehand (see instructions). To store the software environment and the experimental study, the participants should have around 20 GiB of free space on the / (root) partition.
- 13:00–14:00🍽 lunch
Almost a decade ago a Berlin research institute looked for help in compiling scientific software for bioinformatics researchers. A systems engineer from Shanghai with a yearning for a simpler life answered the call. Little did he know that his long repressed infatuation with a quaint programming language would soon resurface, sparking his fantastically deterministic journey away from the traditions of the sysadmin tribe and just beyond the brink of the cutting edge.
This is the story of a quest for predictability, reproducibility, and stability through unpredictable means, ad-hoc hacks, and an embrace of the quirky. Based on actual events.
Keywords: research and HPC; Guix project infrastructure; round PiGx and square holes; Guix Workflow Language; the unreasonable allure of simple abstractions.
- 14:45–15:30Arun Isaac (Department of Genetics, Evolution & Environment, University College London, United Kingdom)📁📺
Scientific software is increasingly complex, but is developed on a shoestring budget. Maintaining a reproducible development environment for all developers and running robust deployments is challenging, to say the least. Wouldn't it be nice to have a tool that does it all and does so correctly?
The traditional Unix way to deploy complex web applications and provision servers is to manually mutate configuration files on the server. Such an approach is brittle, time consuming and hard to migrate to new machines. Tools as varied as Ansible and Docker have been developed to ease this process, but these tools are still mutation based and their abstractions leak in unexpected ways. Guix, with its "functional" package deployment provides the watertight abstractions necessary to express complex deployments with precision.
In this talk, I will present how we deploy development and production environments using Guix at genenetwork.org. I will show how we use Guix channels to distribute our own packages and services; and how we run continuous integration and deployment (CI/CD) using Guix. I will explain how this enables us to further software quality in science, and will hopefully be able to convince you to use more Guix in your team.
PsychNotebook was a web platform for students and scientists providing access to shareable and reproducible programming environments including RStudio and JupyterLab. It was developed and operated by Leibniz Institute for Psychology between 2019 and 2023.
In this talk I will review why PsychNotebook was built, which components we used and built ourselves, why we chose them and how they interacted with each other as well as how the platform was kept running. I will also discuss why, ultimately, the service was shut down and what can be learned from its technical and organizational design.
- 16:15–16:30🍪 break
GLiCID is the HPC center for research in the French region of Pays de la Loire, merging the various pre-existing HPC centers in the region.
The installation of new machines in June 2023 has led to the launch of a brand new common system infrastructure (identity management, slurm services, databases, etc.), independent (as far as possible) of the solutions provided by the manufacturers.
Installed on 2 remotes machine rooms, the infrastructure has to be highly available, implying complex deployment. However, the team wanted to guarantee simple, predictable redeployment of the infrastructure in the event of problems.
Guix, offered as standard on all our clusters, has a proven track record of reproducibility, which is also a desirable feature for our infrastructure. We have therefore tried to build it with Guix. We'll be reporting on the impact of these choices (both positive and negative), and why a 100% rate has not yet been achieved.
High-performance computing (HPC) often requires the use of multiple software packages and that they are optimised on the target machine on which these computations are running. The optimisation constraints are such that it is widely accepted that the deployment of this software can only be done manually or by relying on the work of the administrators of the target machine (typically via a load module). However, the complexity of the dependencies often results in strong constraints on the exact functionality, version and build processes of the requested software. As a result, many HPC codes choose to provide some functionality themselves, which in principle could be provided by third party libraries, contrary to the canons of software engineering.
In this talk, we will first review our quest (CMake, Spack, and now Guix) for an environment to reliably deploy HPC software in a portable, high-performance, and reproducible way, so that the use of third-party libraries is no longer a concern. We will then present our experience of deploying such a complex software stack on several machines in the French and European ecosystem. We show that we have been able to ensure a robust deployment while achieving top performance, not only on machines with Guix available but also on supercomputers where Guix is not (yet!) available.
- 19:30–22:22🍽️ dinner (optional)
Friday morning will conclude the events with tutorials to get you up to speed with Guix—from installation to packaging and cluster administration.
- 09:00–10:30tutorialsession 3: amphitheater📁
Curious about Guix but haven’t yet had a chance to give it a try? Coming from apt, Spack, CONDA, pip? Wondering whether it meets your specific needs?
This tutorial aims to get you started with Guix. We will start with the main commands to manage software with Guix, including on-demand environments with guix shell. We will discuss the technicalities and gotchas one needs to be aware of—from environment variables to pre-built binaries to containerization. We will introduce
channels, how they let you extend the package collection of Guix, and how they let you pin your complete software environment so you can reproduce it elsewhere and at different points in time.
- 09:00–10:30tutorialsession 4: classroomSimon Tournier (Université Paris-Cité, Institut de Recherche Saint Louis, Inserm US53, CNRS 2030, France)📁
This tutorial is dedicated to review what can be done when that’s not enough to list dependencies and/or declare a build system . The aim is to introduce various mechanisms for adapting the base Guix recipe. The prerequisite is the reading of the section “Defining Packages” from the manual and the goal of this tutorial is to provide some ingredients for making it sound. We propose to first introduce a Scheme/Guile Swiss-knife toolbox, then to cover how to modify upstream source code (field origin) and how to customize the build system parameters or phases (field arguments). If time allows, we will introduce the meaning of cryptic symbols as the sequence #~(#$(%.
Do not forget that packaging is a craft, so there is no magic but only practise.
- 10:30–10:45☕ break
- 10:45–12:15tutorialsession 3: amphitheater📁
Guix is based on a full-blown programming language, Scheme. But after overcoming our first cultural shock of meeting parentheses in all the wrong places, we will see that defining a first package is actually a piece of cake by just using one of the many existing package importers. It turns out that simple packages are declared by a simple text file in an intuitive format. Mechanisms in Guix make it possible to enrich the set of existing packages with our own creations. Finally we will see how to give back to the Guix community by submitting our recipe for inclusion into the official git repository.
The aim of this workshop is to let you go home with your own œuvre, so please bring an idea for a (simple) piece of software to package. Depending on time and the problems we encounter, we may discuss more advanced strategies of packaging that require a bit of Scheme code
- 10:45–12:15tutorialsession 4: classroom📁
Read more...Guix is a useful environment to empower the users of an HPC cluster with the applications installations and customization. We'll see how to set-up Guix as a software environment for an HPC cluster and important things to know about the administration of this service.
- 13:00–14:00🍽 lunch