The book Evolutionary Genomics was published in July this year. Of particular interest to Guix-HPC is the chapter entitled “Scalable Workflows and Reproducible Data Analysis for Genomics”, by Francesco Strozzi et al.:
In the quest for truly reproducible workflows I set out to create an example of a reproducible workflow using GNU Guix, IPFS, and CWL. GNU Guix provides content-addressable, reproducible, and verifiable software deployment. IPFS provides content-addressable storage, and CWL describes workflows that can run on specifically supported backend hardware system. In principle, this combination of tools should be enough to provide reproducibility with provenance and improved security.
December 2018 the Akalin lab at the Berlin Institute of Medical Systems Biology (BIMSB) published a paper about a collection of reproducible genomics pipelines called PiGx that are made available through GNU Guix. The article was awarded third place in the GigaScience ICG-13 Prize. Representing the authors, Ricardo Wurmus was invited to present the work on PiGx and Guix in Shenzhen, China at ICG-13.
Ricardo urged the audience of wet lab scientists and bioinformaticians to apply the same rigorous standards of experimental design to experiments involving software: all variables need to be captured and constrained. To demonstrate that this does not need to be complicated, Ricardo reported the experiences of the Akalin lab in building a collection of reproducibly built automated genomics workflows using GNU Guix.
Due to technical difficulties the recording of the talk was lost, so Ricardo re-recorded the talk a few weeks later.
I’m happy to announce that the bioinformatics group at the Max Delbrück Center that I’m working with has released a preprint of a paper on reproducibility with the title Reproducible genomics analysis pipelines with GNU Guix.